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OPTICAL  COMPUTING  BASED  ON  NEURONAL  MODELS 


1 .  INTRODUCTION 

~  The  ultimate  goal  of  the  research  work  carried  out  under  this  grant  is 
understanding  the  computational  algorithms  used  by  the  nervous  system  and 
development  of  systems  that  emulate,  match,  or  surpass  in  their  performance 
the  computational  power  of  biological  brain.  Tasks  such  as  seeing,  hearing, 
touch,  walking,  and  cognition  are  far  too  complex  for  existing  sequential 
digital  computers.  Therefore  new  architectures,  hardware,  and  algorithms 
modeled  after  neural  circuits  must  be  considered  in  order  to  deal  with  real- 
world  problems. 

Neural  net  models  and  their  analogs  represent  a  new  approach  to 
collective  signal  processing  that  is  robust,  fault  tolerant  and  can  be 
extremely  fast.  These  properties  stem  directly  from  the  massive 
interconnectivity  of  neurons  (the  logic  elements)  in  the  brain  and  their 
ability  to  perform  many-to-cne  mappings  with  varied  degree  of  nonlinearity 
and  to  store  information  as  weights  of  the  links  between  them,  i.e.,  their 
synaptic  interconnections,  in  a  distributed  non-localized  manner.  As  a 
result  signal  processing  tasks  such  as  nearest  neighbor  searches  in 
associative  memory  can  be  performed  in  time  durations  equal  to  a  few  time 
constants  of  the  decision  making  elements,  the  neurons,  of  the  net.  We  note 
that  the  switching  time-constant  of  a  biological  neuron  is  of  the  order  of  a 
few  milliseconds.  Artificial  neurons  (electronic  or  opto-electronic 
decision  making  elements)  car.  be  made  tc  be  a  thousand  to  a  million  times 
faster.  Artificial  neural  nets  can  therefore  be  expected  to  function  for 
example  as  content  addressable  associative  memory  or  to  perform  complex 
computational  tasks  such  as  combinatorial  optimization  and  minimization 
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which  are  encountered  in  self -organization  and  learning  (seif  programming), 
computational  vision,  imaging,  inverse  scattering,  super-resolution  and 
automated  recognition  from  partial,  (sketchy)  information,  extremely  fast  in 
a  time  scale  that  exceeds  by  far  the  capability  of  even  the  most  powerful 
present  day  serial  computer.  For  uhese  reasons  electronic  and  opto¬ 
electronic  analogs  and  implementations  of  neural  nets  are  attracting  today 
considerable  attention.  The  optics  in  the  opto-electronic  implementations 
provide  the  needed  parallelism  and  massive  interconnectivity  while  the 
decision  making  elements  are  realized  electronically  heralding  an  ultimate 
marriage  of  VLSI  and  optics.  It  should  be  kept  in  mind  however  that 
research  and  advances  in  optical  bistability  devices  (OBDs)  and  nonlinear 
optics  and  optical  materials,  promise  to  furnish  also  all-optical  decision 
making  elements  and  eventually  neural  nets  in  which  both  the 
interconnections  and  decision  making  are  performed  optically  with  the 
electronics  being  used  only  for  control  and  assessment  of  the  state  of  the 
net.  The  combination  of  optics  and  electronics  and  the  potential  for 
exploiting  advances  in  opto-electronic  components  ana  materials  (for 
example;  nonvolatile  spatial  light  modulators  for  realizing  programmable 
synaptic  or  interconnectivity  masks  (plasticity)  and  OBDs  for  decision 
making)  promise  also  that  embodiments  of  neural  nets  can  be  compact  and  have 
low  power  consumption.  Such  embodiments,  being  primarily  -analog,  are 
leading  to  rekindling  of  interest  in  analog  computation  whose  development 
has  been  curtailed  by  the  explosive  progress  in  digital  computing. 

In  associative  memory  applications,  the  strength  of  interconnection 
between  the  "neurons"  of  the  net  is  determined  by  the  entities  one  wishes  to 
store  in  the  net.  Usually  these  entities  need  to  be  in  the  form  of 
uncorrelated  binary  representations  of  the  original  data.  Specific  storage 
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"recipes"  based  on  a  Hebbian  model  of  learning  (outer-product  storage 
algorithm),  or  variations  thereof,  are  employed  then  to  first  calculate  the 
connectivity  matrix  then  set  the  weights  of  links  between  neurons 
accordingly.  In  this  sense  the  memory  is  explicitly  programmed  i.e.  taught 
what  it  should  know  and  should  be  cognizant  of.  This  .mode  of  programming  a 
net  is  sometimes  called  hard  learning.  What  is  most  intriguing  however,  is 
that  neural  nets  can  also  be  made  to  be  self -organizing  and  learning  i.e., 
to  become  self-programming  (soft  learning)  through  a  process  of  automated 
connectivity  weight  modification  driven  by  the  entities  presented  to  them 
for  learning.  This  alleviates  one  of  the  major  constraints  of  neural  nets: 
programming  complexity,  and  makes  them  a  much  more  attractive  and  powerful 
tool  for  neuromorphic  signal  and  knowledge  processing.  The  combination  of 
neural  nets,  Boltzmann  machines,  and  simulated  annealing  concepts  with  high 
speed  opto-electronic  implementations  promise,  as  demonstrated  by  research 
carried  out  under  this  grant  ([1]-[2]  and  this  report),  to  produce  high¬ 
speed  artificial  neural  net  processors  with  stochastic  rules  for  decision 
making  and  state  update  that  can  form  their  own  internal  representations 
(connectivity  weights)  of  outside  world  data  they  are  presented  with, 
regardless  whether  the  data  is  correlated  or  not,  in  a  manner  very  analogous 
to  the  way  the  brain  is  believed  to  form  its  own  symbolic  representation  of 
reality.  This  is  an  exciting  prospect  and  has  far  reaching  implications  for 
smart  sensing  and  recognition,  and  artificial  intelligence  as  a  whole.  The 
use  of  noise  in  stochastic  learning  can  shed  light  on  the  way  nature  has 
managed  to  turn  noise  present  in  biological  neural  nets  to  work  to  its 
advantage  and  makes  stochastic  learning,  as  opposed  to  deterministic 
learning  schemes  more  biologically  plausible.  Such  learning  is 
probabilistic  in  nature  aimed  at  capturing  the  probability  density  function 
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Probabilistic 


of  the  environmental  representation  the  net  is  exposed  to. 
learning  is  therefore  naturally  compatible  with  real  environmental 
representations  that  are  fuzzy  in  nature.  Exploratory  work  at  the 
University  of  Pennsylvania  is  showing  that  optics  can  play  an  important  role 
in  the  implementation  and  speeding  up  of  adaptive  learning  algorithm,  such 
as  the  simulated  annealing  and  the  error  back-propagation  algorithms,  in 
such  self-organizing  nets  and  can  lead  to  their  use  in  automated  robust 
recognition  of  entities  the  nets  have  had  a  chance  to  learn  earlier  either 
with  or  without  the  aid  of  a  teacher  (supervised  or  unsupervised  learning) 
by  repeated  exposure  to  them  when  the  net  is  in  its  learning  mode.  One  can 
envision  modules  of  such  self-teaching  neural  nets  trained  to  recognize  and 
create  symbols  of  certain  features  found  in  natural  scenes,  patterns  or 
other  input  signals.  Such  modules  could  be  used  collectively  for  higher 
level  processing  where  their  output  symbols  are  fused  to  form  better  or  more 
reliable  interpretation  or  assessment  of  the  environmental  input.  The 
implication  of  this  for  autonomous  systems  are  obvious  but  the  achievement 
of  such  scenarios  requires  further  concerted  research. 

Learning  in  neural  nets  is  not  rote  but  involves  generalization,  i.e. 
the  net  can  recognize  an  input  as  a  member  of  a  class  of  entities  it  became 
familiar  with  earlier  even  though  that  specific  input  was  not  specifically 
among  those  shown  to  it  earlier.  This  property  can  be  extremely  useful  for 
accelerating  teaching  sessions  in  that  one  need  not  think  of  and  present  to 
the  learning  net  all  possible  associations  it  is  supposed  to  recognize  in 
order  to  make  it  useful.  This  property  relegates  however  a  degree  of 
decision  making  to  the  net  perhaps  beyond  what  we  are  ordinarily  accustomed 
to  in  signal  processing  systems.  Thorough  understanding  of  learning 
processes  and  how  a  network  generalizes  is  therefore  desired  to  alleviate 
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apprehensions  and  uncertainties  stemming  from  the  inclusion  of  "thinking 
networks"  in  man  made  systems  that  share  with  him  the  decision  making 
process.  Such  understanding  can  be  realized  only  through  insights  gained 
normally  by  theoretical  analysis  and  with  software  and  hardware  simulation 
tools.  Being  highly  nonlinear,  neural  nets  (as  for  example  in  higher 
cortical  areas),  and  their  models  are  often  difficult  to  analyze.  Numerical 
simulation  of  neural  nets,  even  relatively  small  multilayered  self¬ 
organizing  nets,  are  proving  to  be  computationally  too  intensive  and 
therefore  unacceptably  time  consuming  which  is  hindering  progress  in  the 
field.  It  is  for  this  reason  that  analog  systems  in  which  neural  net 
behavior  can  be  modeled  and  studied  dynamically  at  speeds  that  can  be 
several  orders  of  magnitude  faster  than  in  numerical  simulation  are  an 
important  component  of  our  ongoing  studies  and  the  future  research 
directions  stemming  from  it.  This  work  is  pointing  towards  neural  nets  as 
nonlinear  dynamical  systems  that  are  characterized  by  their  phase  space 
behavior  and  concepts  of  attractors,  chaos  and  fractal  Dimensions.  This 
will  in  our  opinion  provide  an  infusion  of  powerful  concepts  of 
nonlinearity,  collective  behavior,  and  iterative  processing  into  optical 
processing  and  artificial  neurodynamical  systems. 

Another  intriguing  promise  of  neural  nets  is  their  ability  to  store  and 
retrieve  information  in  a  sequential  or  cyclic  manner  where  a  chain  of 
entities  can  be  stored  and  recalled  in  a  hetero-associative  sequential  or 
cyclic  fashion.  This  can  provide  a  crude  but  simple  way  for  forming, 
shaping,  and  controlling  the  limit  cycle  (trajectory)  of  a  neural  net  in  its 
phase-space.  This  property  together  with  that  of  generalization,  mentioned 
earlier,  are  important  for  work  in  pattern  recognition  in  general  and  are 
being  intensively  studied  at  our  Electro-Optics  and  Microwave  Optics 
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Laboratory  in  the  context  of  distortionless  radar  target  recognition  as 
described  in  earlier  work  (see  references  listed  in  this  report  and  in  [1] 

and  [2].  The  results  of  this  work  are  expected  to  be  general  and  would  be 

* 

beneficial  to  active  and  passive  machine  vision  . 

Being  highly  nonlinear  neural  nets  possess  complex,  rich  phase-space 
behavior  that  exhibits  or  is  in  principle  capable  of  exhibiting  the 
following  general  features  of  nonlinear  systems: 

o  Fixed  points  or  limit  points  in  phase-space  that  act  as  attractors 
with  prescribed  basins  of  attraction  that  are  formed  explicitly 
(hard  learning)  or  implicitly  (soft  learning)  by  Hebbian  based 
rules. 

o  Fixed  limit  cycles  or  closed  trajectories  in  phase-space  that  act 
also  as  attractors. 

o  Fixed  open  trajectories  that  act  as  attractors. 

o  Modification  and  control  of  fixed  limit  points,  limit  cycles  and 
open  trajectories  by  external  and/or  contextual  input  or  by 
adaptive  thresholding. 

o  Bifurcation  and  chaotic  behavior. 

Neuromorphic  signal  and  knowledge  processing  systems  (whether  optical, 
electronic,  or  opto-electronic)  must  be  able  to  draw  upon  and  make  use  of 
these  features  to  achieve  powerful  signal  processing  functions.  Such 
functions  include: 


* 

Machines  that  utilize  active  illumination  to  discern  and  perceive  the 
environment  or  utilize  natural  scene  illumination  or  emission  for  the  same 
purpose. 
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o 


Nearest  neighbor  searches 
o  Combinatorial  optimization  by  minimization  of  cost-functions 
o  Solution  of  ill-posed  problems  of  the  kind  encountered  in 
vision,  remote  sensing,  and  inverse  scattering 
o  Feature  extraction:  se If -organi zat  ion ,  learning  and  self- 
programing 
o  Generalization 

o  Sequential  and  cyclic  retention  and  recall 

o  Higher  order  or  more  complex  computations  in  phase-space,  (e.g. 
spoken  language  processing) 

All  the  above  are  issues  that  provide  motivation  to  our  neural  net  and 
opto-eiectronic  implementation  research,  both  current  and  future. 

The  ultimate  realization  of  neuromorphic  systems  for  wide  use  in  signal 
processing  applications  is  not  a  trivial  task.  It  requires  vigorous 
research  and  development  in  three  primary  areas:  neuroscience  -  to  increase 
cur  understanding  of  the  anatomical,  physiological,  and  biochemical 
properties,  and  function  of  neural  tissue  (neur3l  nets)  in  order  to  identify 
those  attributes  that  might  help  in  their  modeling  and  that  can  be  usefully 
applied  in  artificial  systems;  the  study  of  opto-electronic  architectures 
and  implementations,  and  vigorous  device  development  based  on  advances  in 
linear  and  nonlinear  optical  materials  foe  efficient  implementation  of 
programmable  synaptic  weights  (artificial  plasticity)  and  sensitive  optical 
decision  making  elements  capable  of  performing  at  lower  threshold  then 
present  day  devices.  Thus  synergisim  between  a  triad  of  research 
activities:  neuroscience;  mathematical  modeling  and  analysis  coupled  with 
architectures,  implementations,  and  programming;  and  material  research  ;s 
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called  for.  Our  future  research  in  neurodynamics  will  continue  to  be 
influenced  by  developments  in  these  fields. 

2.  RESEARCH  ACCOMPLISHMENTS 

The  first  parts  of  our  research  program  under  this  grant  were  concerned 
with  the  modeling,  simulation,  and  implementation  of  fully  interconnected 
neural  networks  and  were  reported  upon  in  detail  in  two  previous  annual 
reports  [1],[2].  Work  with  fully  interconnected  nets  and  specifically  a 
comparison  of  inner  product  versus  outer  product  schemes  for  associative 
storage  and  recall  (see  Appendix  III)  revealed  to  us  that  one  of  the  most 
distinctive  property  of  neural  nets  that  is  worth  considering  in  our 
research  efforts  is  self -organization  and  learning.  5elp-organization 
(adaptivity)  and  learning  seems  to  be  what  sets  neural  net  processing  apart 
from  other  approaches  to  signal  processing.  Hence  our  efforts  have  since 
been  more  concerned  with  learning  ana  self-programmability  in  neural  nets. 
Learning  reruires  layered  nets  in  which  one  can  clearly  distinguish  input, 
output,  and  hidden  (buffer)  groups  of  neurons  with  proscribed  communication 
patterns  among  them.  Such  nets  are  hence  non-fuily  interconnected.  To  this 
end  we  have  devised  a  scheme  for  partitioning  existing  opto-electronic 
vector-matrix  multiplier  architectures  into  any  number  of  desired  layers 
(see  Appendices  II  and  III).  Learning  requires  plasticity,  i.e.  modifiable 
weights  of  connections  between  neurons.  In  our  work  such  plasticity  is 

achieved  primarily  through  the  use  of  programmable  nonvolatile  spatial  light 

* 

modulators  (SLMs)  such  as  the  magneto-optic  SLM  (MOSLM) .  We  have  devised  a 

* 

Nonvolatile  ferroelectric  liquid  crystal  SLM  can  also  be  used.  These 
however  are  not  available  commercially  yet. 
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f  new  scheme  for  driving  a  commercially  available  48x43  element  MOSLM  at  a 

f 

frame  refresh  time  of  1  msec  demonstrating  thereby  that  s>naptic 
modification  in  a  24  binary  neuron  net  constructed  around  this  MOSLM  can 
take  place  if  desired  in  a  time  period  as  short  as  1  msec. 

From  the  outset  we  have  concentrated  our  effort  on  stochastic  rather 
than  deterministic  learning  for  several  reasons.  Stochastic  or 
probabilistic  learning  is  more  compatible  with  the  uncertainty  of  most 
environments  in  which  learning  neural  nets  are  expected  to  operate.  In 
addition  operation  and  study  of  stochastic  neural  nets  could  shed  some  light 
on  the  way  nature  has  harnessed  noise  present  in  biological  neural  nets  to 
work  to  its  advantage.  Learning  in  stochastic  neural  nets  involves  finding 
the  global  minimum  of  an  energy  function  associated  with  the  network  by 
introducing  uncertainty  in  the  state  update  rule  of  neurons  in  the  net. 
Conventionally  this  is  done  by  a  simulated  annealing  algorithm  in  a  context 
of  a  Boltzmann  machine  that  was  devised  to  be  carried  out  on  a  serial 
computer  and  is  frequently  used  in  the  solution  of  combinatorial 
optimization  problems.  Software  implementation  of  the  process  however  is 
time  consuming.  For  example  finding  the  optimal  wiring  layout  for  a  typical 
I C  chip  might  take  24  hours  on  mainframe  computer.  We  have  ceveloped 
therefore  a  method  for  accelerating  the  annealing  time  in  an  opto-electronic 
stochastic  neural  net  that  can  be  several  orders  of  magnitude  faster  than 
serial  digital  methods.  As  a  result,  stochastic  learning  in  such  nets  can 
be  speeded-up  by  the  same  factor.  The  method  involves  the  use  of  noisy 
thresholding  of  the  neurons  in  the  net  which  introduces  controlled  shaking 
of  the  energy  landscape  of  the  net  and  prevents  the  net  from  getting  trapped 
in  a  state  of  local  energy  minimum  improving  thereby  its  chances  of  finding 
the  ground  state  (global  minimum)  or  one  close  to  it.  By  introducing  the 
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noise  in  the  net  in  bursts  of  decaying  magnitude  the  chances  of  converging 
onto  a  low-lying  energy  state  and  staying  in  it  are  enhanced  considerably. 
(Such  controlled  annealing  profiles  or  annealing  schedules  are  also  useful 
in  stochastic  learning  with  binary  weight  to  be  described  later.)  The 
results  of  numerical  and  experimental  simulations  (see  Appendix  V)  show  that 
the  noisy  thresholding  scheme  is  quite  effective.  We  used  the  noisy 
prototype  thresholding  scheme  in  a  network  of  16  neurons  with  random  bipolar 
binary  weights  implemented  in  opto-electronic  hardware.  The  results  show 
that  the  net  can  find  the  ground  state  in  35t  where  x  is  the  time  constant 
of  the  neurons  in  the  net.  This  means  that  for  a  net  with  neurons  of  x  = 
lysec  response  time  the  net  can  be  annealed  in  35  ysec  and  this  is 
independent  of  the  number  of  neurons  in  the  net  as  noise  in  our  scheme  is 
injected  optically  onto  all  neurons  simultaneously  by  projecting  a  portion 
of  the  snow  pattern  appearing  on  an  open  channel  T.V.  receiver  onto  the 
photodetector  array  segment  of  the  opto-electronic  neural  net. 

The  pixel  transmittance  of  the  MOSLM  mentioned  earlier  is  binary.  Most 
known  learning  algorithms  require  small  incremental  changes  in  the 
connection  strengths  between  neurons  of  the  net.  This  means  multivalued 
weights  are  necessary  precluding  the  use  of  MOSLMs  despite  their  highly 
desirable  nonvolatile  nature  (storage  capability).  Small  incremental 
changes  in  the  weights  is  a  requisite  for  convergence  of  the  learning 
algorithm.  To  overcome  this  limitation  we  have  devised  a  method  for 
stochastic  learning  with  binary  weights.  The  method  combines  multiple 
t ime -constant  annealing  bursts  and  dead-zone  limiting  as  detailed  in 
Appendix  VI. 

Fast  annealing  by  noisy  thresholding  and  stochastic  learning  with 
binary  weights  are  significant  developments  that  enabled  successful 


operation  recently  in  our  work  of  the  first  bimodal  stochastic  optical 
learning  machine  as  detailed  (see  Appendix  VI).  The  machine  consists  of  2M 
unipolar  binary  neurons,  24x24  bipolar  binary  connectivity  mask  implemented  in 
a  48x48  computer  controlled  MOSLM,  and  LED  and  photodetector  arrays  with 
associated  thresholding  amplifiers  and  LED  drivers  for  the  neurons  themselves. 
Preliminary  results  of  the  learning  capabilities  show  that  the  net  can  learn  a 
set  of  3  associations  in  a  time  interval  ranging  between  10  minutes  to  6C 
minutes  with  relatively  slow  (60  msec)  neurons.  Preliminary  results  are  shown 
in  Fig.  1 .  Slow  neurons  were  chosen  deliberately  to  permit  visual  observation 
of  the  evolving  state  vector  of  the  net  as  represented  by  the  LED  array  of  the 
net  during  the  various  stages  of  learning. 

3.  CONCLUSIONS 

Research  effort  under  this  grant  has  led  to  the  demonstration  of  the 
first  stochastic  opto-electronic  learning  machine  employing  fast  annealing  by 
noisy  thresholding  and  stochastic  learning  with  binary  weights.  The  prototype 
machine  of  24  neurons  now  operational  in  our  laboratory  provides  a  valuable 
vehicle  for  studying  the  dynamics  of  stochastic  neural  nets.  As  such,  the  net 
can  be  viewed  as  an  opto-electronic  analog  computer  that  can  perform  iterative 
mappings,  do  stochastic  searches  of  the  energy  landscape,  self-organize  ana 
learn,  and  act  as  associative  memory  after  learning  is  completed.  We  will 
continue  our  studies  of  this  and  larger  versions  of  the  machine  (a  subject  for 
renewal  proposal  under  preparation)  in  order  to  ga^n  better  understanding  of 
the  hehavior  of  such  machines  as  artificial  neurodynamical  systems  and  explore 
a  host  of  intriguing  applications  involving  solution  of  combinatorial 
optimization  problems  of  the  kind  encountered  in  vision,  remote  sensing  and 
inverse  scattering. 


Experimental  Result  —  Learning 

Number  of  input,  neurons  :  8 
Number  of  hidden  neurons  :  8 


Number  of  learning  cycles 


4. 
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simulated  annealing,"  Proc.  IEEE  (Letters),  75  June  1987,  p.  842- 
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February  2-4,  1988. 

"Neural  Networks  for  Computing  Conference,"  Snowbird,  Utah,  April  1988. 
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I.  Optical  Implementation  of  Associative  Memory- 
Based  on  Models  of  Neural  Networks 


II.  Architectures  for  Optoelectronic  Analogs  of 
Self-Organizing  Neural  Networks 


III.  Optoelectronic  Analogs  of  Self-Programming  Neural 

Nets:  Architecture  and  Methodologies  for  Implementing 
Fast  Stochastic  Learning  by  Simulated  Annealing 


IV.  Phased-Array  Antenna  Pattern  Synthesis  by 
Simulated  Annealing 


V.  Architectures  and  Methodologies  for  Self -Organization 
and  Stochastic  Learning  in  Opto-Electronic  Analogs 
of  Neural  Nets 


VI.  Biomodal  Stochastic  Optical  Learning  Machine 
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Fig.  3.  Architectures  for  optical  implementation  of  a  content-addressable  memory  based  on  models  of"  neural  nets,  (a  <  Matrix  vector  multiplier 
incorporating  nonlinear  electronic  feedback.  <bi  optoelectronic  scheme  for  realizing  binary  bipolar  mask  transmittance  in  incoherent  light.  ici 
optical  feedback  scheme  incorporating  hybrid  optical  light  amplifier  arras,  id*  optical  feedback  *uh  thin  film  bistable  light  amplifier  and 
programmable  connectivity  matrix. 
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Architectures  for  partitioning  optoelectronic  analogs  of  neural  nets  into  input-output  and  internal  groups  to  form  a 
multilayered  net  capable  of  self-organization,  self-programming,  and  learning  are  described.  The  architectures  and 
implementation  ideas  given  describe  a  class  of  optoelectronic  neural  net  modules  that,  wnen  interfaced  to  a 
conventional  computer  controller,  can  impart  to  it  artificial  intelligence  attributes. 


In  earlier  work  on  optical  analogs  of  neural  nets,1*6  the 
nets  described  were  programmed  to  do  a  specific  com¬ 
putational  task,  namely,  a  nearest-neighbor  search 
consisting  of  finding  the  stored  entity  that  is  closest  to 
the  address  in  the  Hamming  sense.  The  net  acted  as  a 
content-addressable  associative  memory.  The  pro¬ 
gramming  was  done  by  first  computing  the  intercon¬ 
nectivity  matrix  using  an  outer-product  recipe  given 
the  entities  that  one  wished  the  net  to  store  and  then 
setting  the  weights  of  synaptic  interconnections  be¬ 
tween  neurons  accordingly. 

In  this  Letter  we  are  concerned  with  architectures 
for  optoelectronic  implementation  of  neural  nets  that 
are  able  to  program  or  organize  themselves  under  su¬ 
pervised  conditions,  i.e.,  of  nets  that  are  capable  of  (1) 
computing  the  interconnectivity  matrix  for  the  associ¬ 
ations  that  they  are  to  learn  and  (2)  changing  the 
weights  of  the  links  between  their  neurons  according¬ 
ly.  Such  self-organizing  networks  therefore  have  the 
ability  to  form  and  store  their  own  internal  represen¬ 
tations  of  the  associations  that  they  are  presented 
with. 

Multilayered  self-programming  nets  were  described 
as  early  as  1969,"  and  in  more  recent  descriptions'*-10 
the  net  is  partitioned  into  three  groups.  Two  are 
input  and  output  groups  of  neurons  that  interface  with 
the  net  environment.  The  third  is  a  group  of  hidden 
or  internal  units  that  separates  the  input  and  output 
units  and  participates  in  the  process  of  forming  inter¬ 
nal  representations  of  the  associations  that  the  net  is 
presented  with. 

Two  supervised  learning  procedures  in  such  parti¬ 
tioned  nets  have  recently  attracted  attention.  One  is 
stochastic,  involving  a  simulated  annealing  pro¬ 
cess,1112  and  the  other  is  deterministic,  involving  an 
error  backpropagation  process.9  There  is  general 
agreement,  however,  that  because  of  their  iterative 
nature,  sequential  computation  of  the  links  using 
these  algorithms  is  time  consuming.  A  faster  means 
for  carrying  out  the  required  computations  is  needed. 

Optics  and  optoelectronic  architectures  and  tech¬ 
niques  can  play  an  important  role  in  the  study  and 
implementation  of  self-programming  networks  and  in 
speeding  up  the  execution  of  learning  algorithms. 


Here  we  describe  a  method  for  partitioning  an  opto¬ 
electronic  analog  of  a  neural  net  to  implement  a  multi¬ 
layered  net  analog  that  can  learn  stochastically  by 
means  of  a  simulated  annealing  learning  algorithm  in 
the  context  of  a  Boltzmann  machine  formalism  [see 
Fig.  1(a)].  The  arrangement  shown  in  Fig.  1(a)  de¬ 
rives  from  the  neural  network  analogs  that  we  de¬ 
scribed  earlier.2  The  network,  consisting  of,  say,  N 
neurons,  is  partitioned  into  three  groups.  Two 
groups,  V[  and  V2,  represent  input  and  output  units, 
respectively.  The  third  group,  H,  comprises  hidden 
or  internal  units.  The  partition  is  such  that  IVj  +  No  + 
-V3  =  jV,  where  Nu  No,  a^d  ,V3  refer  to  the  number  of 
neurons  in  the  Vb  V2,  and  H  groups,  respectively. 
The  interconnectivity  matrix,  designated  here  W;J,  is 
partitioned  into  nine  submatrices,  A-F,  and  three  zero 
submatrices,  shown  as  blackened  or  opaque  regions  of 
the  Wjj  mask.  The  LED  array  represents  the  state  of 
the  neurons,  assumed  to  be  unipolar  binary  (LED  on, 
neuron  firing;  LED  off,  neuron  not  firing).  The  W,, 
mask  represents  the  strengths  of  interconnection 
among  neurons  in  a  manner  similar  to  earlier  arrange¬ 
ments.2  Light  from  each  LED  is  smeared  vertically 
over  the  corresponding  column  of  the  VF,;  mask  with 
the  aid  of  an  anamorphic  lens  system  [not  shown  in 
Fig.  1(a)],  and  light  emerging  from  each  row  of  the 
mask  is  focused  with  the  aid  of  another  anamorphic 
lens  system  (also  not  shown)  onto  the  corresponding 
elements  of  the  photodetector  (PD)  array.  The 
scheme  utilized  in  Ref.  2  for  realizing  bipolar  values  of 
WtJ  in  incoherent  light  is  adopted  here;  it  consists  of 
separating  each  row  of  the  Wj,  mask  into  two  subrows, 
assigning  positive-valued  W,.  to  one  subrow  and  nega¬ 
tive-valued  Wl}  to  the  other,  and  focusing  light  emerg¬ 
ing  from  the  two  subrows  separately  onto  two  adjacent 
photosites  connected  in  opposition  in  the  photodetec¬ 
tor  array.  Submatrix  A,  with  N\  X  elements,  pro¬ 
vides  the  interconnection  weights  between  units  or 
neurons  within  group  Vj.  Submatrix  B,  with  No  X  JV2 
elements,  provides  the  interconnection  weights  be¬ 
tween  units  within  V2.  Submatrices  C  (with  N i  X  JV3 
elements)  and  D  (with  No,  X  N\  elements)  provide  the 
interconnection  weights  between  units  of  V!  and  H, 
and  submatrices  E  (with  N- >  X  N 3  elements)  and  F 
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Fig.  1.  Architecture  for  optoelectronic  analog  of  layered 
self-programming  net.  (a)  Partitioning  concept  showing 
adjustable  global  threshold  scheme,  t  b)  Arrangement  for 
rapid  determination  of  the  net's  global  energy  E  for  use  in 
learning  by  simulated  annealing. 


(with  .V; i  X  .Vo  elements)  provide  the  interconnection 
weights  of  units  of  Vo  and  H.  Units  in  V[  and  V-_> 
cannot  communicate  with  one  another  directly  be¬ 
cause  the  locations  of  their  interconnectivity  weights 
in  the  W,.  matrix  or  mask  are  blocked  out  (blackened 
lower-left  and  top-right  portions  of  W,).  Similarly, 
units  within  H  do  not  communicate  with  one  another 
because  locations  of  their  interconnectivity  weights  in 
the  W,  mask  are  also  blocked  out  (center  blackened 
square  of  W,,).  The  LED  element  9  is  of  graded  re¬ 
sponse.  Its  output  represents  the  state  of  an  auxiliary 
neuron  in  the  net  that  is  always  on  to  provide  a  global 
threshold  level  to  all  units  by  contributing  only  to  the 
light  focused  onto  negative  photosites  of  the  PD  ar¬ 
rays  from  pixels  in  the  G  column  of  the  interconnectiv¬ 
ity  mask.  This  is  achieved  by  suitable  modulation  of 
pixels  in  the  G  column.  This  method  for  introducing 
the  threshold  level  is  attractive,  as  it  allows  for  provid¬ 
ing  to  all  neurons  in  the  net  a  fixed  global  threshold,  an 
adaptive  global  threshold,  or  even  a  noisy  global 
threshold. 

By  using  a  computer-controlled  nonvolatile  spatial 
light  modulator  to  implement  the  IV,,  mask  in  Fig.  1(a) 
and  including  a  computer-controller  as  shown,  the 
scheme  can  be  made  self-programming  with  the  ability 
to  modify  the  weights  of  synaptic  links  between  its 
neurons.  This  is  done  by  fixing  or  clamping  the  states 
of  the  Vi  (input)  and  V_>  (output)  groups  to  each  of  the 
associations  that  we  want  the  net  to  learn  and  by 


repeated  application  of  the  simulated  annealing  pro¬ 
cedure  with  the  Boltzmann  or  another  stochastic 
state-update  rule  and  collecting  statistics  on  the  states 
of  the  neurons  at  the  end  of  each  run  when  the  net 
reaches  thermodynamic  equilibrium. 

Starting  from  an  arbitrary  W,,,  and  for  each  clamp¬ 
ing  of  the  V[  and  VU  units  to  one  of  the  associations, 
the  states  of  units  in  H  are  switched  and  annealing  is 
applied,  until  thermodynamic  equilibrium  is  reached. 
The  state  vector  of  the  entire  net,  which  represents  a 
state  of  the  global  energy  minimum,  is  then  stored  by 
the  computer.  This  procedure  is  repeated  for  each 
association  several  times  and  the  final  state  vectors 
recorded  every  time.  Note  that,  because  of  the  proba¬ 
bilistic  nature  of  the  state-update  rule  discussed  later 
and  in  Eqs.  (1)  and  (2)  below,  the  states  of  global 
energy  minimum  in  the  runs  for  each  association  may 
not  necessarily  be  exactly  the  same.  Therefore  the 
need  to  collect  statistics  from  which  the  probabilities 
p,j  of  finding  the  ith  and  jth  neurons  in  the  same  state 
can  then  be  obtained.  Next,  with  the  output  units  V_ 
unclamped  to  let  them  run  free,  the  above  procedure  is 
repeated  for  the  same  number  of  annealings  as  before 
and  the  probabilities  p,/  are  obtained.  The  weights 
W,j  are  then  incremented  by  AW,,  =  r;(p,,  —  p,/),  where 
n  is  a  constant  that  controls  the  speed  and  efficacy  of 
learning.  Starting  from  the  new  WLJ,  the  above  proce¬ 
dure  is  repeated  until  a  steady  state  W„  is  reached,  at 
which  stage  the  learning  procedure  is  complete. 

Learning  by  simulated  annealing  requires  calculat¬ 
ing  the  energy,  E.  of  the  net8'10: 

£  =  ~  ~  u''s"  (1) 

i 

where  s,  is  the  state  of  the  ith  neuron  and 

u.  =  V  IV,, s,  —  9,  +  /.,  1 2 ) 

/  ^  1 

respectively.  A  simplified  version  of  a  rapid  scheme 
for  obtaining  E  optoelectronically  is  shown  in  Fig. 
1(b).  The  scheme  requires  the  use  of  an  electronically 
addressed  nonvolatile  binary  (on-off)  spatial  light 
modulator  (SLM)  consisting  of  a  single  column  of  .V 
pixels.  A  suitable  candidate  is  a  parallel-addressed 
magneto-optic  SLM  (MOSLM)  consisting  of  a  single 
column  of  N  pixels  that  are  driven  electronically  by 
the  same  signal  driving  the  LED  array  in  order  to 
represent  the  state  vector  s  of  the  net.  A  fraction  of 
the  focused  light  emerging  from  each  row  of  the  W 
mask  is  deflected  by  the  beam  splitter  BS  onto  the 
individual  piuels  of  the  column  MOSLM  such  that 
light  from  adjacent  pairs  of  subrows  falls  upon  one 
pixel  of  the  MOSLM.  The  MOSLM  pixels  are  over¬ 
laid  by  a  checkered  binary  mask  as  shown.  The 
opaque  and  transparent  pixels  in  the  checkered  mask 
are  staggered  in  such  a  fashion  that  light  emerging 
from  the  left  subcolumn  will  be  derived  from  the  posi¬ 
tive  subrows  W,,+  of  W,,  and  light  emerging  from  the 
right  subcolumn  will  be  derived  from  the  negative 
subrows  W„"  of  W,,.  By  separately  focusing  the  light 
from  the  left  and  right  subcolumns  as  shown  onto  two 
photodetectors  and  subtracting  and  halving  their  out¬ 
puts,  one  obtains 
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which  is  the  required  energy.  In  Eq.  (3)  the  contribu¬ 
tions  of  0,  and  I,  in  Eq.  (2)  are  absorbed  in  WtJ.  The 
simulated  annealing  algorithm  involves  determining 
the  change  A Ek  in  E  that  is  due  to  switching  the  state 
of  the  feth  neuron  in  H  selected  at  random,  computing 
the  probability  p(A£*)  =  l/[l  +  exp(— AEi,/T)],  and 
comparing  the  result  with  a  random  number  nre( 0,  1) 
produced  by  a  fast  random-number  generator.  If 
p(AEh)  >  a,,  the  change  in  the  state  of  the  kth  neuron 
is  retained.  Otherwise  it  is  discarded  and  the  neuron 
is  returned  to  its  original  state  before  a  new  neuron  in 
H  is  randomly  selected  and  switched  and  the  anneal¬ 
ing  procedure  repeated.  This  algorithm  ensures  that 
in  thermal  equilibrium  the  relative  probability  of  two 
global  states  follows  a  Boltzmann  distribution8;  hence 
it  is  sometimes  referred  to  as  the  Boltzmann  machine. 
In  this  fashion  the  search  for  a  state  of  global  energy 
minimum  is  done  by  employing  a  gradient-descent 
algorithm  that  allows  for  probabilistic  hill  climbing. 
The  annealing  process  usually  also  includes  a  cooling 
schedule  in  which  the  temperature  T  is  allowed  to 
decrease  between  iterations  to  increase  gradually  the 
fineness  of  search  for  the  global  energy  minimum. 
Optoelectronic  methods  for  generating  p(AEi,)  that 
employ  speckle  statistics18  and  for  generating  the  ran¬ 
dom  number  nr  by  photon  counting14  can  also  be  in¬ 
corporated  to  speed  up  the  procedure  and  reduce  the 
number  of  digital  computations  involved. 

The  architecture  described  here  for  partitioning  a 
neural  net  can  be  used  in  hardware  implementation 
and  study  of  self-programming  and  learning  algo¬ 
rithms  such  as  the  simulated  annealing  algorithm  out¬ 
lined  here.  The  parallelism  and  massive  interconnec¬ 
tivity  provided  through  the  use  of  optics  should  mark¬ 
edly  speed  up  learning  even  for  the  simulated 
annealing  algorithm,  which  is  known  to  be  quite  time 
consuming  when  carried  out  on  a  sequential  machine. 
The  partitioning  concept  described  is  also  extendable 
to  multilayered  nets  of  more  than  three  layers  and  to 
the  two-dimensional  arrangement  of  synaptic  inputs 
to  neurons,  as  opposed  to  the  one-dimensional  or  lin¬ 
eal  arrangement  described  here.  Other  learning  algo¬ 
rithms  calling  for  a  multilayered  architecture,  such  as 
the  error  backpropagation  algorithm,9 15  can  also  now 
be  envisaged  optoelectronically  by  employing  the  par¬ 
titioning  scheme  described  here  or  variations  of  it. 

Learning  algorithms  in  layered  nets  lead  to  analog 
or  multivalued  W,j.  Therefore  high-speed  computer- 
controlled  SLM’s  with  graded  pixel  response  are 
called  for.  Methods  of  reducing  the  needed  dynamic 
range  of  W,,  or  for  allowing  the  use  of  ternary  W,,  are. 
however,  under  study  to  permit  the  use  of  commercial¬ 
ly  available  fast  nonvolatile  binary  SLM  devices  such 
as  the  Litton/Semetex  MOSLM.,H  It  is  worth  noting 


that  the  role  of  optics  in  the  architecture  described  not 
only  facilitates  partitioning  the  net  into  groups  or  lay¬ 
ers  but  clso  provides  the  massive  interconnectivity 
mentioned  earlier.  For  example,  for  a  neural  net  with 
a  total  of  N  —  512  neurons,  the  optics  permit  making 
21V2  =  2.62  X  105  programmable  weighted  intercon¬ 
nections  among  the  neurons  in  addition  to  the  4 N  = 
2048  interconnections  that  would  be  needed  in  the 
arrangement  shown  Fig.  1(b)  to  compute  the  energy  E. 

Assuming  that  material  and  device  requirements  of 
the  architectures  described  can  be  met  and  parti¬ 
tioned,  self-organizing  neural  net  modules  will  be  rou¬ 
tinely  constructed;  then  the  addition  of  such  a  module 
to  a  computer-controller  through  a  high-speed  inter¬ 
face  can  be  viewed  as  providing  the  computer-control¬ 
ler  with  artificial  intelligence  capabilities  by  imparting 
to  it  neural  net  attributes.  These  capabilities  include 
self-organization,  self-programmability  and  learning, 
and  associative  memory  capability  for  conducting 
nearest-neighbor  searches.  Such  attributes  would  en¬ 
able  a  small  computer  to  perform  powerful  computa¬ 
tional  tasks  of  the  kind  needed  in  pattern  recognition 
and  in  the  solution  of  combinatorial  optimization 
problems  and  ill-posed  problems  encountered,  for  ex¬ 
ample,  in  inverse  scattering  and  vision,  which  are  con¬ 
fined  at  present  to  the  domain  of  supercomputers. 

The  research  reported  was  supported  by  grants 
from  the  Defense  Advanced  Research  Projects  Agen¬ 
cy-Naval  Research  Laboratory,  the  U.S.  Army  Re¬ 
search  Office,  and  The  University  of  Pennsylvania's 
Laboratory  for  Research  on  the  Structure  of  Matter. 
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Optoelectronic  analogs  of  self-programming  neural  nets: 
architecture  and  methodologies  for  implementing  fast 
stochastic  learning  by  simulated  annealing 


Nabil  H.  Farhat 


Self -organization  and  learning  is  a  distinctive  feature  of  neural  nets  and  processors  that  sets  them  apart  from 
conventional  approaches  to  signal  processing.  It  leads  to  self-programmability  which  alleviates  the  problem 
of  programming  complexity  in  artificial  neural  nets.  In  this  paper  architectures  for  partitioning  an  optoelec¬ 
tronic  analog  of  a  neural  net  into  distinct  layers  with  prescribed  interconnectivity  pattern  to  enable  stochastic 
learning  by  simulated  annealing  In  the  context  of  a  Boltzmann  machine  are  presented.  Stochastic  learning  is 
of  interest  because  of  its  relevance  to  the  role  of  noise  in  biological  neural  nets.  Practical  considerations  and 
methodologies  for  appreciably  accelerating  stochastic  learning  in  such  a  multilayered  net  are  described. 
These  include  the  use  of  parallel  optical  computing  of  the  global  energy  of  the  net.  the  use  of  fast  nonvolatile 
programmable  spatial  light  modulators  to  realize  fast  plasticity,  optical  generation  of  random  number  arrays, 
and  an  adaptive  noisy  thresholding  scheme  that  also  makes  stochastic  learning  more  biologically  plausible. 
The  findings  reported  predict  optoelectronic  chips  that  can  be  used  in  the  realization  of  optical  learning 
machines. 
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I.  Introduction 

Interest  in  neural  net  models  (see,  for  example.  Refs. 
1-9)  and  their  optical  analogs  (see,  for  example.  Refs. 
10-25)  stems  from  well-recognized  information  pro¬ 
cessing  capabilities  of  the  brain  and  the  fit  between 
what  optics  can  do  and  what  even  simplfied  models  of 
neural  nets  can  offer  toward  the  development  of  new 
approaches  to  collective  signal  processing. 

Neural  net  models  and  their  analogs  present  a  new 
approach  to  collective  signal  processing  that  is  robust, 
fault  tolerant,  and  can  be  extremely  fast.  Collective  or 
distributed  processing  describes  the  transfer  among 
groups  of  simple  processing  units  (e.g.,  neurons),  that 
communicate  among  each  other,  of  information  that 
one  unit  alone  cannot  pass  to  another.  These  proper¬ 
ties  stem  directly  from  the  massive  interconnectivity 
of  neurons  (the  decision-making  elements)  in  the  brain 
and  their  ability  to  store  information  as  weights  of 
links  between  them,  i.e.,  their  synaptic  interconnec¬ 
tions,  in  a  distributed  nonlocalized  manner  As  a  re¬ 
sult,  signal  processing  tasks  such  as  nearest-neighbor 
searches  in  associative  memory  can  be  performed  in 
time  durations  equal  to  a  few  time  constants  of  the 
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decision-making  elements,  the  neurons,  of  the  net. 
The  switching  time  constant  of  a  biological  neuron  is  of 
the  order  of  a  few  milliseconds.  Artificial  neurons 
(electronics  or  optoelectronic  decision-making  ele¬ 
ments)  can  be  made  to  be  a  thousand  to  a  million  times 
faster.  Artificial  neural  nets  can  therefore  be  expect¬ 
ed  to  function,  for  example,  as  content-addressable 
associative  memory  or  to  perform  complex  computa¬ 
tional  tasks  such  as  combinatorial  optimization  which 
are  encountered  in  computational  vision,  imaging,  in¬ 
verse  scattering,  superresolution,  and  automated  rec¬ 
ognition  from  partial,  (sketchy)  information,  extreme¬ 
ly  fast  in  a  time  scale  that  is  way  out  of  reach  for  even 
the  most  powerful  serial  computer.  In  fact  once  a 
neural  net  is  programmed  to  do  a  given  task  it  will  do  it 
almost  instantaneously.  More  about  this  point  later. 
As  a  result  optoelectronic  analogs  and  implementa¬ 
tions  of  neural  nets  are  attracting  considerable  atten¬ 
tion.  Because  of  the  noninteracting  nature  of  pho¬ 
tons,  the  optics  in  these  implementations  provide  the 
needed  parallelism  and  massive  interconnectivity  and 
therefore  a  potential  for  realizing  relatively  large  neu¬ 
ral  nets  while  the  decision-making  elements  are  real¬ 
ized  electronically  heralding  a  possible  ultimate  mar¬ 
riage  between  VLSI  and  optics. 

Architectures  suitable  for  use  in  the  implementation 
of  optoelectronic  neural  nets  of  1-D  and  2-D  arrange¬ 
ments  of  neurons  were  studied  and  described  earli- 
er.ib-15  Two-dimensional  architectures  for  optoelec¬ 
tronic  analogs  have  been  successfully  utilized  in  the 
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recognition  of  objects  from  partial  information  by  ei¬ 
ther  complementing  the  missing  information  or  by 
automatically  generating  correct  labels  of  the  data  (ob¬ 
ject  feature  spaces)  the  memory  is  presented  with.23 
These  architectures  are  based  primarily  on  the  use  of 
incoherent  light  to  help  maintain  robustness,  by  avoid¬ 
ing  speckle  noise  and  the  strict  positioning  require¬ 
ments  encountered  when  use  of  coherent  light  is  con¬ 
templated. 

In  associative  memory  applications,  the  strengths  of 
interconnections  between  the  neurons  of  the  net  are 
determined  by  the  entities  one  wishes  to  store.  Ideal 
storage  and  recall  occurs  when  the  stored  vectors  are 
randomly  chosen,  i.e.,  uncorrelated.  Specific  storage 
recipes  based  on  a  Hebbian  model  of  learning  (outer- 
product  storage  algorithm),  or  variations  thereof,  are 
usually  used  to  explicitly  calculate  the  weights  of  inter¬ 
connections  which  are  set  accordingly.  This  repre¬ 
sents  explicit  programming  of  the  net,  i.e.,  the  net  is 
explicitly  taught  what  it  should  know.  What  is  most 
intriguing,  however,  is  that  neural  net  analogs  can  also 
be  made  to  be  self-organizing  and  learning,  i.e.,  become 
self-programming.  The  combination  of  neural  net 
modeling,  Boltzmann  machines,  and  simulated  an¬ 
nealing  concepts  with  high-speed  optoelectronic  im¬ 
plementations  promises  to  produce  high-speed  artifi¬ 
cial  neural  net  processors  with  stochastic  rather  than 
deterministic  rules  for  decision  making  and  state  up¬ 
date.  Such  nets  can  form  their  own  internal  represen¬ 
tations  (connectivity  weights)  of  their  environment 
(the  outside  world  data  they  are  presented  with)  in  a 
manner  analogous  to  the  way  the  brain  forms  its  own 
representations  of  reality.  This  is  quite  intriguing  and 
has  far-reaching  implications  for  smart  sensing  and 
recognition,  thinking  machines,  and  artificial  intelli¬ 
gence  as  a  whole.  Our  exploratory  work  is  showing 
that  optics  can  also  play  a  role  in  the  implementation 
and  speeding  up  of  learning  procedures  such  as  simu¬ 
lated  annealing  in  the  context  of  Boltzmann  machine 
formalism,-6-29-49  and  error  backpropagation30  in  such 
self-teaching  nets  and  for  their  subsequent  use  in  auto¬ 
mated  robust  recognition  of  entities  the  nets  have  had 
a  chance  to  learn  earlier  by  repeated  exposure  to  them 
when  the  net  is  in  a  learning  mode.  Induced  self¬ 
organization  and  learning  seem  to  be  what  sets  apart 
optical  and  optoelectronic  architectures  and  process¬ 
ing  based  on  models  of  neural  nets  from  other  conven¬ 
tional  approaches  to  optical  processing  and  have  the 
advantage  of  avoiding  explicit  programming  of  the  net 
which  can  be  time-consuming  and  has  come  to  be 
referred  to  as  the  programming  complexity  of  neural 
nets.48  The  partitioning  scheme  presented  in  Sec.  Ill 
permits  defining  input,  output,  and  intermediate  lay¬ 
ers  of  neurons  and  any  prescribed  communication  pat¬ 
tern  between  them.  This  enables  the  implementation 
of  deterministic  learning  algorithms  such  as  error 
backprojection.  However,  the  discussion  in  this  paper 
focuses  on  stochastic  learning  by  simulated  annealing 
since  3uch  learning  algorithms  may  prove  to  be  more 
biologically  plausible  since  they  might  account  for  the 
noise  present  in  biological  neural  nets  as  will  be  elabo¬ 
rated  on  in  Sec.  TV. 
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In  this  paper  we  are  therefore  concerned  with  archi¬ 
tectures  for  optoelectronic  implementation  of  neural 
nets  that  are  able  to  program  or  organize  themselves 
under  supervised  conditions,  i.e.,  of  nets  that  are  capa¬ 
ble  of  (a)  computing  the  interconnectivity  matrix  for 
the  associations  they  are  to  learn,  and  (b)  changing  the 
weights  of  the  links  between  their  neurons  accordingly. 
Such  self-organizing  networks  have  therefore  the  abili¬ 
ty  to  form  and  store  their  own  internal  representations 
of  the  entities  or  associations  they  are  presented  with. 
In  Sec.  II  we  attempt  to  elucidate  those  features  that 
set  neural  processing  apart  from  conventional  ap¬ 
proaches  to  signal  processing.  The  ideas  expressed 
have  been  arrived  at  as  a  result  of  maintaining  a  critical 
attitude  and  constantly  keeping  in  mind,  when  en¬ 
gaged  in  the  study  of  neural  net  models  and  their 
applications,  the  question  of  what  is  unique  about  the 
way  they  perform  signal  processing  tasks.  If  they 
seem  to  perform  a  signal  processing  function  well,  could 
the  same  function  be  carried  out  equally  well  with  a 
conventional  processing  scheme?  To  gain  insight  into 
this  question  we  were  led  to  a  comparison  between 
outer-product  and  inner-product  schemes  for  imple¬ 
menting  associative  memory.  The  insight  gained 
from  this  exercise  points  clearly  to  certain  distinction 
between  neural  and  conventional  approaches  to  signal 
processing  which  will  lead  us  to  considerations  of  self¬ 
programmability  and  learning.  These  are  presented 
in  Sec.  II  together  with  a  description  of  architectures 
for  optoelectronic  analogs  of  such  self-organizing 
nets.  The  emphasis  is  on  stochastic  supervised  learn¬ 
ing,  rather  than  deterministic  learning,  and  on  the  use 
of  noise  to  ensure  that  the  combinatorial  search  proce¬ 
dure  for  a  global  energy  or  cost  functiou  during  the 
learning  phase  does  not  get  trapped  in  a  local  mini¬ 
mum  of  the  cost  function.  In  Sec.  Ill  a  discussion  of 
practical  considerations  related  to  the  implementation 
of  the  architectures  described  and  for  accelerating  the 
learning  process  is  presented.  An  estimate  of  the 
speedup  factor  compared  to  serial  implementation  is 
included.  Conclusions  and  implications  of  the  work 
are  then  given.  These  attest  to  a  continuing  role  for 
optics  in  the  implementation  of  artificial  neural  net 
modules  or  neural  chips  with  self-programming  and 
learning  capabilities,  i.e.,  to  optical  learning  machines. 

II.  Distinctive  Features  ot  Neural  Processing 

Right  from  the  outset,  when  attention  was  first 
drawn  to  the  fit  between  optics  and  neural  models.10 11 
our  investigations  of  optoelectronic  analogs  of  neural 
nets  and  their  applications  have  perpetually  kept  in 
view  the  question  of  what  is  it  that  neural  nets  can  do 
that  is  not  doable  by  conventional  means,  i.e.,  by  well- 
established  approaches  to  signal  processing.  Such 
critical  attitude  is  found  useful  and  almost  mandatory 
to  avoid  being  swept  into  ill-conceived  research  en¬ 
deavors.  It  is  not  easy  of  course  to  see  all  the  ramifica¬ 
tions  of  a  problem  while  one  is  immersed  in  its  study 
and  solution,  but  a  critical  attitude  always  helps  to 
isolate  real  attributes  from  biased  ones. 

Being  collective,  adaptive,  iterative,  and  highly  non- 


linear,  neural  net  models  and  their  analogs  exhibit 
complex  and  rich  behavior  in  their  phase  space  or  state 
space  that  is  described  in  terms  of  attractors,  limit 
points,  and  limit  cycles  with  associated  basins  of  at¬ 
traction,  bifurcation,  and  chaotic  behavior.  The  rich 
behavior  offers  intellectually  attractive  and  challeng¬ 
ing  areas  of  research.  Moreover,  many  believe  that  in 
studying  neural  nets  and  their  models  we  are  attempt¬ 
ing  to  benefit  from  nature’s  experience  in  its  having 
arrived  over  a  prolonged  period  of  time,  through  a 
process  of  trial  and  error  and  retainment  of  those  per¬ 
mutations  that  enhance  the  survivability  of  the  organ¬ 
ism,  at  a  powerful,  robust,  and  highly  fault-tolerant 
processor,  the  brain,  that  can  serve  as  the  model  for  a 
new  generation  of  computing  machines.  Clues  and 
insights  gained  from  its  study  can  be  immensely  bene¬ 
ficial  for  use  in  artificially  intelligent  man-made  ma¬ 
chines  that,  like  the  brain,  are  highly  suited  for  pro¬ 
cessing  of  spatiotemporal  multi-sensory  data  and  for 
motor  control  in  a  highly  adaptive  and  interactive 
environment. 

All  the  above  are  general  attributes  and  observations 
that  by  themselves  are  sufficient  justification  for  the 
interest  displayed  in  neural  nets  as  a  new  approach  to 
signal  processing  and  computation.  To  gain,  however, 
further  specific  insight  in  what  sets  neural  nets  apart 
from  other  approaches  to  signal  processing,  we  consid¬ 
er  a  specific  example.  This  involves  comparison  be¬ 
tween  two  mathematically  equivalent  representations 
of  a  neural  net,  one  involving  outer  products,  and  the 
other  inner  products.31  We  begin  by  considering  the 
optoelectronic  neural  net  analog  described  earlier12 
and  represented  here  in  Fig.  1.  The  iterative  proce¬ 
dure  determining  the  evolution  of  the  state  vector  v  of 
the  net  is  illustrated  in  Fig.  1(a)  and  the  vector-matrix 
multiplication  scheme  with  thresholding  and  feedback 
used  to  interconnect  all  neurons  with  each  other 
through  weights  set  by  the  T„  mask  is  shown  in  Fig. 
1(b).  For  a  net  of  size  N  with  interconnectivity  matrix 
T1/(  where  T„  *  0,  ij  »  1,2  ...  N,  the  iterative  equation 
for  the  state  vector  is 


Fig.  1.  Outer-product  (distributed)  storage  and  recall  scheme. 


sgn 


,5t-] 


where  the  superscripts  ( q )  and  (q+l)  designate  two 
consecutive  iterations  and  sgnj-|  represents  the  sign  of 
the  bracketed  quantity.  The  iteration  triggered  by  an 
externally  applied  initializing  or  strobing  vector  "»'v,  q 
*  0,  i.e.,  (0)v,  continues  until  a  steady-state  vector  that 
is  one  of  the  nominal  state  vectors  or  attractors  of  the 
net  that  is  closest  to  ,mv  in  the  Hamming  sense  is 
converged  upon.  At  this  point  the  net  has  completed  a 
nearest- neighbor  search  operation.  For  simplicity  the 
usual  terms  for  the  threshold  0,  and  external  input  /,  of 
the  ith  neuron  have  been  omitted  from  Eq.  ( 1 ) .  These 
can,  without  loss  of  generality  of  the  conclusions  ar¬ 
rived  at  below,  be  assumed  to  be  zero  or  absorbed  in  the 
summation  in  Eq.  (1)  through  the  use  of  two  additional 
always-on  neurons  that  communicate  to  every  other 
neuron  in  the  net  its  threshold  and  external  input 
levels,  through  appropriate  weights  added  to  T.;. 
Note  in  Fig.  1  that  the  iterated  input  vector  is  always 
the  transpose  of  the  threshclded  output  vector. 

By  substituting  the  expression  for  the  storage  ma¬ 
trix 


M 

-  y  v‘"'Tr’’  ,2) 

<»“i 

formed  by  summing  the  outer  products  of  the  stored 
vectors  v'm’,  i  -  1,2,  . . .  ,Vand  m  -  1,2  . . .  Af,  into  Eq. 
(1)  and  interchanging  the  order  of  summations,  we 
obtain 

r* 

-  sgnl  y  ■  •  1 3) 

where  ,v 

'«'C,  -  V  ,4) 

.•l 

are  coefficients  determined  by  the  inner  product  of  the 
input  vector  '<*>?  at  any  iteration  by  each  of  the  stored 
vectors.  Equations  (3)  and  (4)  can  be  implemented 
employing  the  optoelectronic  direct  storage  and  inner- 
product  recall  scheme  shown  in  Fig.  2  in  which  LEA 
and  PDA  represent  light  emitting  array  and  photode¬ 
tector  array,  respectively.  Noting  that  the  two  seg¬ 
ments  to  the  left  and  to  the  right  of  the  diffuser  in  Fig.  2 
are  identical,  one  can  arrive  at  the  simplified  equiva¬ 
lent  reflexive  inner-product  scheme  shown  in  Fig.  3. 
Now  we  have  arrived  at  two  equivalent  implementa¬ 
tions  of  the  neural  model.  These  are  shown  together 
in  Fig.  4.  One  employs  outer -product  distributed  stor¬ 
age  and  vector-matrix  multiplication  with  threshold - 
ed  feedback  in  the  recall  as  shown  in  Fig.  4(a),  and  the 
second  employs  direct  storage  and  inner-product  re¬ 
call  with  thresholded  feedback  as  shown  in  Fig.  4(b). 
The  reflexive  or  inner- product  scheme  has  several  ad¬ 
vantages  over  the  outer-product  scheme.  One  is  stor¬ 
age  capacity.  While  an  N  X  N  storage  matrix  in  the 
outer-product  scheme  can  store  Mi  N/4  InN  vectors 
of  length  N  beyond  which  the  probability  of  correct 
recall  deteriorates  rapidly  because  of  proliferation  of 
spurious  states, 12  the  storage  mask  T,m,;  1.2  ..  .  S,m 
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Fig.  2.  Direct  storage  and  inner- product  recall  scheme. 


Fig.  3.  Reflexive  inner- product  scheme. 
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Fig.  4-  Two  equivalent  neural  net  analogs:  (a)  outer-product  dis¬ 
tributed  storage  and  recall  with  external  feedback:  (b)  reflexive 
inner-product  direct  storage  and  recall  with  internal  feedback. 
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Fig.  5.  Concept  of  nonlinear  resonator  content  addressable 
memory. 


■  1,2  .. .  iVf,  in  the  inner-product  scheme  can  store 
directly  a  stack  of  up  to  M  *  N  vectors  in  the  same  size 
matrix.  It  can  be  argued  that  the  robustness  and  the 
fault  tolerance  of  the  distributed  storage  scheme  have 
been  sacrificed  in  the  inner-product  scheme,  but  this  is 
a  mute  argument  Robustness  can  be  easily  restored 
by  introducing  a  certain  degree  of  redundancy  in  the 
inner-product  scheme  of  Fig.  4(b).  This  can  be  done, 
for  example,  by  storing  a  vector  in  more  than  one 
location  in  the  stack.  The  internal  optical  feedback  in 
the  inner-product  implementation  is  certainly  another 
attractive  advantage.  In  fact  the  beauty  of  internal 
feedback  has  inspired  the  concept  of  reflexive  associa¬ 
tive  memory  or  nonlinear  resonator  CAM  (content 
addressable  memory)  shown  in  Fig.  5.  This  scheme, 
which  becomes  possible  because  the  T  matrix  is  sym¬ 
metrical,  utilizes  the  same  optics  for  internal  feedback 
and  for  transposing  the  reflected  state  vectors.  The 
scheme  is  perfectly  suited  for  use  with  nonlinear  reflec¬ 
tor  arrays  or  arrays  of  optically  bistable  elements. 
The  advantages  of  a  similar  bidirectional  associative 
memory  have  also  been  noted  recently  elsewhere.25 

In  view  of  the  obvious  advantages  of  the  reflexive 
scheme  [Fig.  4(b)l,  one  is  led  to  question  the  reason 
nature  appears  to  prefer  distributed  (Hebbian)  storage 
[as  in  Fig.  4(a)]  over  localized  storage  [as  in  Fig.  4(b)] 
besides  fault  tolerance  and  redundancy.  As  a  result  of 
the  preceding  exercise  the  answer  now  comes  readily  to 
mind:  in  the  inner-product  scheme  the  connectivity 
matrix  T,;  is  not  present.  Self-organization  and  learn¬ 
ing  in  biological  systems  are  associated  with  modifica¬ 
tions  of  the  synaptic  weights  matrix.  Hence  learning 
in  the  neural  sense  is  not  possible  in  the  inner-product 
scheme.  In  this  sense  the  inner-product  scheme  of 
Fig.  4(b)  is  not  neural  but  involves  conventional  corre¬ 
lations  between  the  input  vectors  and  the  stored  vec¬ 
tors.  One  can  argue  that  the  instant  the  identity  of  the 
weights  matrix  T,;  was  obliterated  the  inner  product 
network  stopped  being  neural  as  learning  through 
weights  modification  is  no  longer  possible.  We  are 
therefore  led  to  conclude  that  distributed  storage  and 
self -organization  and  learning  are  the  most  distinctive 
features  of  neural  signal  processing  as  opposed  to  con¬ 
ventional  approaches  to  signal  processing  3uch  as  in 
the  inner-product  scheme  which  involves  simple  corre¬ 
lations  and  where  it  is  not  clear  how  seif -organization 
and  learning  can  be  performed  since  there  is  no  T,; 
matrix  to  be  modified. 

Neural  net  processing  has  additional  attractive  fea¬ 
tures  that  are  not  as  distinctive  as  self-organization 
and  learning.  These  include  heteroassociative  storage 
and  recall  where  the  same  net  performs  the  functions 
of  storage,  processing,  and  labeling  of  the  output  (final 
state)  simultaneously.  While  such  a  task  may  also  be 
realized  with  conventional  signal  processing  nets,  each 
of  the  above  three  functions  must  however  be  realized 
separately  in  a  different  subnet.  A  striking  example  of 
this  feature  reported  recently23  is  in  the  area  of  radar 
target  recognition  from  partial  information  employing 
sinogram  representation  of  targets  of  interest.  The 
sinogram  representations  were  used  in  computing  and 
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setting  the  synaptic  weight  matrix  in  an  explicit  learn¬ 
ing  mode.  Recognition  in  radar  from  partial  informa¬ 
tion  is  tantamount  to  solution  of  the  superresolution 
problems.  The  ease  and  elegance  with  which  the  neu¬ 
ral  net  approach  solves  this  classical  problem  is,  to  say 
the  least,  impressive. 

Other  distinctive  features  of  neural  nets  associated 
with  the  rich  phase-space  behavior  are  bifurcation  and 
chaotic  behavior.  These  were  mentioned  earlier  but 
are  restated  here  because  of  their  importance  in  se¬ 
quential  processing  of  data  (e.g.,  cyclic  heteroassocia- 
tive  memory)  and  in  the  modeling  and  study  of  mental 
disorder  and  the  effect  of  drugs  on  the  nervous  sys¬ 
tem.33 

■.  Partitioning  Architectures  and  Stochastic  Learning 
by  Simulated  Annealing 

In  preceding  work  on  optical  analogs  of  neural 
nets,10-25  the  nets  described  were  programmed  to  do  a 
specific  computational  task,  namely,  a  nearest-neigh¬ 
bor  search  that  consisted  of  finding  the  stored  entity 
that  is  closest  to  the  address  in  the  Hamming  sense. 
As  such  the  net  acted  as  a  content  addressable  associa¬ 
tive  memory.  The  programming  was  done  by  first 
computing  the  interconnectivity  matrix  using  a  Heb- 
bian  (outer-product)  recipe  given  the  entities  one 
wished  the  net  to  store,  followed  by  setting  the  weights 
of  synaptic  interconnections  between  neurons  accord¬ 
ingly. 

In  this  section  we  are  concerned  with  architectures 
for  optoelectronic  implementation  of  neural  nets  that 
are  able  to  program  or  organize  themselves  under  su¬ 
pervised  conditions.  Such  nets  are  capable  of  (a)  com¬ 
puting  the  interconnectivity  matrix  for  the  associa¬ 
tions  they  are  to  learn,  and  (b)  changing  the  weights  of 
the  links  between  their  neurons  accordingly.  Such 
self-organizing  networks  therefore  have  the  ability  to 
form  and  store  their  own  internal  representations  of 
the  associations  they  are  presented  with.  The  discus¬ 
sion  in  this  section  is  an  expansion  of  one  given  earli¬ 
er.34 

Multilayered  self-programming  nets  have  recently 
been  attracting  increasing  attention. *-28-30-3S  For  ex¬ 
ample,  in  Ref.  28  the  net  is  partitioned  into  three 
groups,  two  are  input  and  output  groups  of  neurons 
that  interface  with  the  net  environment  and  the  third 
is  a  group  of  hidden  or  internal  units  that  acts  as  a 
buffer  between  the  input  and  output  units  and  partici¬ 
pates  in  the  process  of  forming  internal  representa¬ 
tions  of  the  associations  the  net  is  presented  with. 
This  can  be  done,  for  example,  by  clamping  or  fixing 
the  states  of  neurons  in  the  input  and  output  groups  to 
the  desired  pairs  of  associations  and  letting  the  net  run 
through  its  learning  algorithm  to  arrive  ultimately  at  a 
specific  set  of  synaptic  weights  or  links  between  the 
neurons.  No  neuron  or  unit  in  the  input  group  is 
linked  directly  to  a  neuron  in  the  output  group  and  vice 
versa.  Any  such  communication  must  be  carried  out 
via  the  hidden  units.  Neurons  within  the  input  group 
can  communicate  among  each  other  and  with  hidden 
units  and  the  same  is  true  for  neurons  in  the  output 
group.  Neurons  in  the  hidden  group  cannot  commu- 


Fig.  6.  Optoelectronic  analog  of  self-organizing  neural  net  parti¬ 
tioned  into  three  layers  capable  of  stochastic  self-programming  and 
learning. 

nicate  among  each  other.  They  can  only  communicate 
with  neurons  in  the  input  and  output  groups  as  stated 
earlier. 

Two  supervised  learning  procedures  in  multilayered 
nets  have  recently  attracted  attention.  One  is  sto¬ 
chastic,  involving  a  simulated  annealing  process,36 
and  the  other  is  deterministic,  involving  an  error  back- 
propagation  process.30  There  is  general  agreement, 
however,  that  because  of  their  iterative  nature,  se¬ 
quential  computation  of  the  weights  using  these  algo¬ 
rithms  is  very  time-consuming.  A  faster  means  for 
carrying  out  the  required  computations  is  needed. 
Nevertheless,  the  work  mentioned  represents  a  mile¬ 
stone  in  that  it  opens  the  way  for  powerful  collective 
computations  in  multilayered  neural  nets  and  the  par¬ 
titioning  concept  dispels  earlier  reservations36  about 
the  capabilities  of  early  single  layered  models  of  neu¬ 
ral  nets  such  as  the  Perceptron.37  The  partitioning 
feature  and  the  ability  to  define  input  and  output 
neurons  may  also  be  the  key  for  realizing  meaningful 
interconnection  between  neural  modules  for  the  pur¬ 
pose  of  performing  higher-order  hierarchical  process¬ 
ing. 

Optics  and  optoelectronic  architectures  and  tech¬ 
niques  can  play  an  important  role  in  the  study  and 
implementation  of  self-programming  networks  and  in 
speeding  up  the  execution  of  learning  algorithms. 
Here  we  describe  a  method  for  partitioning  an  opto¬ 
electronic  analog  of  a  neural  net  to  implement  a  multi¬ 
layered  net  that  can  learn  stochastically  by  means  of  a 
simulated  annealing  learning  algorithm  in  the  context 
of  a  Boltzmann  machine  formalism  (see  Fig.  6).  The 
arrangement  shown  in  Fig.  6  derives  from  the  neural 
network  analogs  we  described  earlier. 12  The  network, 
consisting  of,  say,  N  neurons,  is  partitioned  into  three 
groups.  Two  groups,  V ,  and  V2,  represent  visible  units 
that  can  be  viewed  as  input  and  output  groups,  respec¬ 
tively.  The  third  group  H  are  hidden  or  internal  units. 
The  partition  is  such  that  IV,  +  1V2  +  IV3  »  N.  where  N\, 
Nn,  and  IV3  refer  to  the  num  ber  of  neurons  in  the  V, ,  V 
and  H  groups,  respectively.  The  interconnectivity 
matrix,  TIJt  is  partitioned  into  six  submatrices.  A.  B.  C. 
D,  E,  F,  and  three  zero-valued  submatrices  shown  as 
blackened  or  opaque  regions  of  the  T,,  mask.  The 
LED  array  represents  the  state  of  the  neurons,  as¬ 
sumed  to  be  unipolar  binary  (LED  on  =  neuron  firing. 
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LED  off  ■  neuron  not  firing).  The  T,y  mask  repre¬ 
sents  the  strengths  of  interconnection  between  neu¬ 
rons  in  a  manner  similar  to  earlier  arrangements.12 
Light  from  each  LED  is  smeared  vertically  over  the 
corresponding  column  of  the  T,y  mask  with  the  aid  of 
an  anamorphic  lens  system  (not  shown  in  Fig.  6),  and 
light  emerging  from  each  row  of  the  mask  is  focused 
with  the  aid  of  another  anamorphic  lens  system  (also 
not  shown)  onto  the  corresponding  elements  of  the 
photodetector  (PD)  array.  The  same  scheme  utilized 
in  Ref.  12  for  realizing  bipolar  values  of  Tiy  in  incoher¬ 
ent  light  is  assumed  here,  namely,  separating  each  row 
of  the  Tiy  mask  into  two  subrows  and  assigning  posi¬ 
tive-valued  T,y  to  one  subrow  and  negative- valued  Tiy 
to  the  other,  and  focusing  light  emerging  from  the  two 
subrows  separately  on  two  adjacent  photosites  on  the 
photodetector  array  connected  in  opposition.  Subma¬ 
trix  A,  with  N\  X  N]_  elements,  provides  the  intercon¬ 
nection  weights  between  units  or  neurons  within  group 
V\.  Submatrix  B,  with  N2  X  N2  elements,  provides  the 
interconnection  weights  between  units  within  V2. 
Submatrices  C  (with  Nx  X  N3  elements)  and  D  (with  N3 
X  N 1  elements)  provide  the  interconnection  weights 
between  units  of  V\  and  H  and  submatrices  E  (with  iV2 
X  N3  elements)  and  F  (with  iV3  X 1V2  elements)  provide 
the  interconnection  weights  of  units  of  V2  and  H. 
Units  in  and  V2  cannot  communicate  among  each 
other  directly  because  locations  of  their  interconnecti¬ 
vity  weights  in  the  Tiy  matrix  or  mask  are  blocked  out 
(blackened  lower  left  and  top  right  portions  of  T,y). 
Similarly  units  within  H  do  not  communicate  among 
each  other  because  locations  of  their  interconnectivity 
weights  in  the  T,y  mask  are  also  blocked  out  (blackened 
center  square  of  Tiy).  The  LED  element  9  is  of  graded 
response.  Its  output  represents  the  state  of  an  auxilia¬ 
ry  neuron  in  the  net  that  is  always  on  to  provide  a 
global  threshold  level  to  all  units  by  contributing  only 
to  the  light  focused  onto  negative  photosites  of  the 
photodetector  (PD)  arrays  from  pixels  in  the  G  column 
of  the  interconnectivity  mask.  This  is  achieved  by 
suitable  modulation  of  the  transmittance  of  pixels  in 
the  G  column.  This  method  for  introducing  the 
threshold  level  is  attractive,  as  it  allows  for  providing 
to  all  neurons  in  the  net  a  fixed  global  threshold,  an 
adaptive  global  threshold,  or  even  nosiy  global  thresh¬ 
old  if  desired. 

By  using  a  computer-controlled  nonvolatile  spatial 
light  modulator  to  implement  the  T,y  mask  in  Fig.  6 
and  including  a  computex  controller  as  shown,  the 
scheme  can  be  made  self-programming  with  ability  to 
modify  the  weights  of  synaptic  links  between  its  neu¬ 
rons.  This  is  done  by  fixing  or  clamping  the  states  of 
the  Vx  (input)  and  V2  (output)  groups  to  each  of  the 
associations  we  want  the  net  to  learn  and  by  repeated 
application  of  the  simulated  annealing  procedure  with 
Boltzmann,  or  other  stochastic  state  update  rule,  and 
collection  of  statistics  on  the  states  of  the  neurons  at 
the  end  of  each  run  when  the  net  reaches  thermody¬ 
namic  equilibrium. 

Stochastic  learning  by  simulated  annealing  in  the 
partitioned  net  proceeds  as  follows: 


(1)  Starting  from  an  arbitrary  Tiy  clamp  V\  and  V2 
to  the  desired  association  keeping  H  free  running. 

(2)  Randomly  select  a  neuron  in  H,  say  the  fcth 
neuron,  and  flip  its  state  [recall  we  are  dealing  with 
binary  (0,1)  neurons]. 

(3)  Determine  the  change  A£*  in  global  energy  E  of 
the  net  caused  by  changing  the  state  of  the  kth  neuron. 

(4)  If  A Ek  <  0,  adopt  the  change. 

(5)  If  A Ek  >  0,  do  not  discard  the  change  outright 
but  calculate  first  the  Boltzmann  probability  factor, 

-a£„ 

Pkm**P— —  •  *5i 

and  compare  the  outcome  to  a  random  number  .V,  e 
[0,1].  If  Pk  >  Nr,  adopt  the  change  of  states  of  the  &th 
neuron  even  if  it  leads  to  an  energy  increase  (i.e.,  A£,  > 
0).  If  Pk  <  Nn  discard  change,  i.e.,  return  the  )eth 
neuron  to  its  original  state. 

(6)  Once  more  select  a  neuron  in  H  randomly  and 
repeat  steps  (1)— (5). 

(7)  Repeat  steps  (1)— (6)  reducing  at  every  round  the 
temperature  T  gradually  [e.g.,  T  ■  T0/log(l  +  m), 
where  m  is  the  round  number,  cooling  schedule  is  fre¬ 
quently  used  to  ensure  convergence]  until  a  situation  is 
reached  where  changing  states  of  neuron  in  H  does  not 
alter  the  energy  E,  i.e.,  A Ek  -*  0.  This  indicates  a  state 
of  thermodynamic  equilibrium  or  a  state  of  global  en¬ 
ergy  minimum  has  been  reached.  The  temperature  T 
determines  the  fineness  of  search  for  a  global  mini¬ 
mum.  A  high  T  produces  coarse  search  and  low  T  a 
finer  grained  search. 

(8)  Record  the  state  vector  at  thermodynamic  equi¬ 
librium,  i.e.,  the  states  of  all  neurons  in  the  net,  i.e.. 
those  in  H  and  those  in  Vx  and  V2  that  are  clamped. 

(9)  Repeat  steps  (l)-(8)  for  all  other  association  on 
V 1  and  V2  we  want  the  net  to  learn  and  collect  statistics 
on  the  states  of  all  neurons  by  storing  the  states  at 
thermodynamic  equilibrium  in  computer  memory  as 
in  step  (8).  This  completes  the  first  phase  of  exposing 
the  net  to  its  environment. 

(10)  Generate  the  probabilities  P,y  of  finding  the  ith 
neuron  and  the  jth  neuron  in  the  same  state.  This 
completes  phase  I  of  the  learning  cycle. 

(11)  U nclamp  neurons  in  V2  letting  them  run  free  as 
with  neurons  in  H. 

(12)  Repeat  steps  (1)-(10)  for  all  input  vectors  V; 
and  collect  statistics  on  the  states  of  all  neurons  in  the 
net. 

(13)  Generate  the  probabilities  P,:  of  finding  neuron 
{  and  neuron  j  in  the  same  state. 

(14)  Increment  the  current  connectivity  matrix  T,; 
by  ATiy  ■  c(P,y  —  Pt))  where  e  is  a  constant  representing 
and  controlling  the  speed  of  learning.  This  completes 
phase  II  of  the  learning  cycle. 

(15)  Repeat  steps  (1)-(14)  again  and  again  until  the 
increments  AT,y  tend  to  zero,  i.e.,  become  smaller  than 
some  prescribed  small  num  ber.  At  this  point  the  net  is 
said  to  have  captured  the  underlying  structure  or 
formed  its  own  representations  of  its  environment  de¬ 
fined  by  the  associations  presented  to  it.  We  are  now 
dealing  with  a  learned  net. 
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One  can  make  the  following  observations  regarding 
the  above  procedure: 

The  search  for  state  of  global  energy  minimum  is 
basically  a  gradient  descent  procedure  that  allows  for 
probabilistic  hill  climbing  to  avoid  entrapment  in  a 
state  of  local  energy  minimum.  The  relative  probabil¬ 
ity  of  two  global  states  a  and  0  is  given  by  the  Boltz¬ 
mann  distribution  PJPs  *  exp  |-(£'a  -  Ea)/T\,  hence 
the  name  Boltzmann  machine.38  Therefore  the  lowest 
energy  state  is  the  most  probable  at  any  temperature 
and  is  sought  by  the  procedure. 

Unlike  explicit  programming  of  a  neural  net  where 
lack  of  correlation  among  the  stored  vectors  is  a  pre¬ 
requisite  for  ideal  storage  and  recall,  self-program¬ 
ming  by  simulated  annealing  has  no  such  requirement. 
In  fact  learning  by  simulated  annealing  in  a  Boltzmann 
machine  looks  for  underlying  similarities  or  correla¬ 
tions  in  the  training  set  to  generate  weights  that  can 
make  the  net  generalize.  Generalization  is  a  property 
where  the  net  recognizes  an  entity  presented  to  it  even 
though  it  was  not  among  those  specifically  used  in  the 
learning  session.  Learning  is  thus  not  rote. 

The  final  T reached  represents  a  net  that  has 
learned  its  environment  by  itself  under  supervision, 
i.e.,  it  has  formed  its  own  internal  representations  of  its 
surroundings.  Those  environmental  states  or  input/ 
output  associations  that  occur  more  frequently  will 
influence  the  final  T,,  more  than  others  and  hence  form 
more  vivid  impressions  in  the  synaptic  memory  matrix 

T./< 

The  learning  procedure  is  stochastic  but  is  still  basi¬ 
cally  Hebbian  in  nature  where  the  change  in  the  synap¬ 
tic  interconnection  between  two  units  (neurons)  de¬ 
pends  on  finding  the  two  units  in  the  same  state 
(sameness  reinforcement  rule). 

Evidently,  being  stochastic  in  nature  (involving 
probabilistic  state  transition  rules  and  simulated  an¬ 
nealing)  the  learning  procedure  is  lengthy  (taking 
hours  in  a  digital  simulation  for  nets  of  a  few  tens  to  a 
few  hundred  neurons).  Hence,  speeding  up  the  pro¬ 
cess  by  using  analog  optoelectronic  implementation  is 
highly  desirable. 

Stochastic  learning  consists  of  two  phases:  phase  I 
involves  generating  probabilities  Piy  when  the  input 
and  output  of  the  net  are  specified.  Phase  II  involves 
generating  the  probabilities  Piy  when  only  the  input  is 
specified  while  the  rest  of  the  net  is  free  running  fol¬ 
lowed  by  computing  the  weight  increments  and  modi¬ 
fying  the  Tty  matrix  accordingly. 

IV.  Accelerated  Learning 

Stochastic  learning  by  the  simulated  annealing  pro¬ 
cedure  we  described  was  originally  conceived  for  serial 
computation.  When  dealing  with  parallel  optical 
computing  systems  it  does  not  make  sense  to  exactly 
follow  a  serial  algorithm.  Modifications  that  can  take 
advantage  of  the  available  parallelism  of  optics  to 
speed  up  stochastic  learning  are  therefore  of  interest. 
In  this  section  we  discuss  several  such  modifications 
that  offer  potential  for  speeding-up  stochastic  learning 
in  optoelectronic  implementations  by  several  orders  of 


magnitude  compared  to  serial  digital  implementation. 

Learning  by  simulated  annealing  requires  calculat¬ 
ing  the  energy  E  of  the  net,7  38 

£»-|yu,vi,  (6) 

i 

where  i>;  is  the  state  of  the  ith  neuron  and 

u,  ”  y  TjVj  -#,  +  /,  (7) 

is  the  activation  potential  of  the  ith  neuron  with  0,  and 
Ii  being  the  threshold  level  and  external  input  to  the 
i-th  neuron  respectively  and  the  summation  term  rep¬ 
resenting  the  input  to  the  i-th  neuron  from  all  other 
neurons  in  the  net.  By  absorbing  9,  and  I,  in  the 
summation  term  as  described  earlier,  Eq.  (7)  can  be 
simplified  to 

<8) 

A  simple  analog  circuit  for  calculating  the  contribu¬ 
tion  Ei  of  the  ith  neuron  to  the  global  energy  E  of  the 
net  is  shown  in  Fig.  7(a).  Here  the  product  of  the 
activation  potential  of  the  ith  neuron  and  the  3tate  c,  of 
the  ith  neuron  is  formed  to  obtain  E,  which  is  then 
added  to  all  terms  formed  similarly  in  parallel  for  all 
other  neurons  in  the  net.  Although  VLSI  implemen¬ 
tation  of  such  an  analog  circuit  for  parallel  calculation 
of  the  global  energy  is  feasible,  this  becomes  less  at¬ 
tractive  as  the  number  of  neurons  increases  because  of 
the  interconnection  problem  associated  with  the  large 
fan-in  at  the  summation  element. 

A  simplified  version  of  a  rapid  scheme  for  obtaining 
E  optoelectronically  is  shown  in  Fig.  7.  The  scheme 
requires  the  use  of  an  electronically  addressed  nonvol¬ 
atile  binary  (on-off)  spatial  light  modulator  consisting 
of  a  single  column  of  N  pixels.  A  suitable  candidate  is 
a  parallel  addressed  magnetooptic  spatial  light  modu- 
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Fig.  7  Two  schemes  for  parallel  computing  of  the  global  energy  in 
an  optoelectronic  analog  of  a  multilayered  seif-organizing  net:  >  a  i 
electronic  scheme:  t  b I  optoelectronic  scheme. 
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lator  (MOSLM)  consisting  of  a  single  column  of  N 
pixels  that  are  driven  electronically  by  the  same  signal 
driving  the  LED  array  to  represent  the  state  vector  v  of 
the  net.  A  fraction  of  the  focused  light  emerging  from 
each  row  of  the  T,,  mask  is  deflected  by  the  beam 
splitter  (BS)  onto  the  individual  pixels  of  the  column 
MOSLM  such  that  light  from  adjacent  pairs  of  su¬ 
brows  of  T,;  falls  on  one  pixel  of  the  MOSLM.  The 
MOSLM  pixels  are  overlaid  by  a  checkered  binary 
mask  as  shown.  The  opaque  and  transparent  pixels  in 
the  checkered  mask  are  staggered  in  such  a  fashion 
that  light  emerging  from  the  left  subcolumn  will  origi¬ 
nate  from  the  positive  subrows  T,*  of  T,y  only  and  light 
emerging  from  the  right  subcolumn  will  originate  from 
the  negative  subrows  T~  or  T,;.  By  separately  focusing 
the  light  from  the  left  and  right  subcolumns  as  shown 
onto  two  photodetectors  and  subtracting  and  halving 
their  outputs,  one  obtains 

£--4[feT5-’# 

which  is  the  required  global  energy. 

The  learning  procedure  detailed  in  Sec.  Ill  requires 
fast  random  number  generation  for  use  in  random 
drawing  and  switching  of  state  of  neurons  from  H 
(during  phase  I  of  learning)  and  from  H  and  V2  (during 
phase  £1  of  learning).  Another  random  number  is  also 
needed  to  execute  the  stochastic  state  update  rule 
when  AE*  >  0.  Although  fast  digital  pseudorandom 
number  generation  of  up  to  109  s'1  is  feasible39  and  can 
be  used  to  help  3peed  up  digital  simulation  of  the 
learning  algorithm,  this  by  itself  is  not  sufficient  to 
make  a  large  impact  especially  when  the  total  number 
of  neurons  in  the  net  is  large.  Optoelectronic  random 
number  generation  is  also  possible  although  at  a  slower 
rate  of  10s  3.  Despite  the  slower  rate  of  generation, 
optoelectronic  methods  have  advantages  that  will  be 
elaborated  on  below.  An  optoelectronic  method  for 
generating  the  Boltzmann  probability  factor  p(A£*) 


[see  Eq.  (5)]  employing  speckle  statistics  is  described 
in  Ref.  40  and  optical  generation  of  random  number 
arrays  by  photon  counting  image  acquisition  systems 
or  clipped  laser  speckle  have  also  been  recently  de¬ 
scribed.41-14  These  photon  counting  image  acquisi¬ 
tion  systems  have  the  advantage  of  being  able  to  gener¬ 
ate  normalized  random  numbers  with  any  probability 
density  function.  A  more  important  advantage  of  op¬ 
tical  generation  of  random  number  arrays  however  is 
the  ability  to  exploit  the  parallelism  of  optics  to  modify 
the  simulated  annealing  and  the  Boltzmann  machine 
formalism  detailed  above  to  achieve  significant  im¬ 
provement  in  speed.  As  stated  earlier,  with  parallel 
optical  random  number  generation,  a  spatially  and 
temporally  uncorrelated  linear  array  of  perculating 
light  spots  of  suitable  size  can  be  generated  and  imaged 
on  the  photodetector  array  (PDA)  of  Fig.  6  such  that 
both  the  positive  and  negative  photosites  of  the  PDA 
[see  also  Fig.  7(a)]  are  subjected  to  random  irradiance. 
This  introduces  a  random  (noise)  component  in  9.  and 
/,  of  Eq.  (7)  which  can  be  viewed  as  a  bipolar  noisy 
threshold.  The  noisy  threshold  produces  in  turn  a 
noisy  component  in  the  energy  in  accordance  with  Eq. 
(6).  The  magnitude  of  the  noise  components  can  be 
controlled  by  varying  the  standard  deviation  of  the 
random  light  intensity  array  irradiating  the  PDS. 
The  noisy  threshold  therefore  produces  random  con¬ 
trolled  perturbation  or  shaking  of  the  energy  land¬ 
scape  of  the  net.  This  helps  shake  the  net  loose  when¬ 
ever  it  gets  trapped  in  a  local  energy  minimum.  The 
procedure  can  be  viewed  as  generating  a  controlled 
deformation  or  tremor  in  the  energy  landscape  of  the 
net  to  prevent  entrapment  in  a  local  energy  minimum 
and  thereby  ensure  convergence  to  a  state  of  global 
energy  minimum.  Both  the  random  drawing  of  neu¬ 
rons  (more  than  one  at  a  time  is  now  possible)  and  the 
stochastic  state  update  of  the  net  are  now  done  in 
parallel  at  the  same  time.  This  leads  to  significant 
acceleration  of  the  simulated  annealing  process.  The 
parallel  optoelectronic  scheme  for  computing  the  glob¬ 
al  energy  described  earlier  [see  Fig.  7(b)]  can  be  used  to 
modulate  the  standard  deviation  of  the  optical  random 
noise  array  used  to  produce  a  noisy  threshold  with  a 
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function  of  the  instantaneous  global  energy  E  and/or 
its  time  rate  change  dE/dt.  In  this  fashion  an  adaptive 
noisy  threshold  scheme  can  be  realized  to  control  the 
tremors  in  the  energy  landscape  if  necessary.  The 
above  discussion  gives  an  appreciation  of  the  advan¬ 
tages  and  flexibility  of  using  optical  random  array 
generators  in  making  the  net  rapidly  find  states  of 
global  energy  minimum.  No  attempt  is  made  here  to 
estimate  in  detail  the  speed  enhancement  over  digital 
execution  of  the  simulated  annealing  process  as  this 
will  be  dependent  on  the  characteristics  of  the  light 
emitting  array,  the  photodetector  array,  the  spatial 
light  modulator,  and  the  speed  of  the  computer-con¬ 
troller  interface  used.  Nevertheless,  the  enhancement 
over  digital  serial  computation  can  be  significant,  ap¬ 
proaching  5-6  orders  of  magnitude  especially  for  rela¬ 
tively  large  multilayer  nets  consisting  of  from  a  few 
tens  to  a  few  hundred  neurons.  A  recent  study  of 
learning  in  neuromorphic  VLSI  systems  in  the  context 
of  a  modified  Boltzmann  machine  gives  speedup  esti¬ 
mates  of  106  over  serial  digital  simulations.45 

V.  Optoelectronic  Neural  Chip 

The  discussion  in  the  preceding  sections  shows  that 
optical  techniques  can  simplify  and  speedup  stochas¬ 
tic  learning  in  artificial  neural  nets  and  make  them 
more  practicaL  The  attractiveness  and  practicality  of 
optoelectronic  analogs  of  self-programming  and  learn¬ 
ing  neural  nets  are  enhanced  further  by  the  concept  of 
optoelectronic  neural  chips  presented  in  Fig.  8.  The 
embodiments  shown  rely  heavily  on  the  use  of  comput¬ 
er  or  microprocessor  interfaced  spatial  light  modula¬ 
tors  and  photodetector  arrays.  The  figure  shows  how 
the  free-space  anamorphic  lens  system  in  the  top  left 
embodiment  can  be  replaced  by  a  single  photodetector 
array  with  horizontal  strip  elements  that  spatially  in¬ 
tegrate  the  light  emerging  from  rows  of  MOSLM  2 
(lower  right  embodiment).  MOSLM  2  represents  the 
T,;  mask  of  Fig.  6.  Each  column  MOSLM  1  is  uniform¬ 
ly  activated  by  the  computer  controller.  This  replaces 
the  function  of  the  anamorphic  lens  system  that  was 
needed  in  Fig.  6  to  smear  the  light  from  the  LED  array 
vertically  onto  the  elements  of  the  T,;  mask.  The 
optoelectronic  neural  chip  represents  a  neural  module 
operating  in  an  ambient  light  environment  as  com¬ 
pared  with  a  biological  neural  module  operating  in  a 
chemical  environment.  The  chip  thus  derives  some  of 
its  operating  energy  from  the  ambient  light  environ¬ 
ment. 

VI.  Discussion 

The  architecture  described  here  for  partitioning  a 
neural  net  can  be  used  in  hardware  implementation 
and  study  of  self- programming  and  learning  algo¬ 
rithms  such  as,  for  example,  the  simulated  annealing 
algorithm  outlined  here.  The  parallelism  and  massive 
interconnectivity  provided  through  the  use  of  optics 
should  markedly  speed  up  learning  even  for  the  simu¬ 
lated  annealing  algorithm,  which  is  known  to  be  quite 
time-consuming  when  carried  out  on  a  sequential  ma¬ 
chine.  The  partitioning  concept  described  is  also  ex¬ 


tendable  to  multilayered  nets  of  more  than  three  layers 
and  to  2-D  arrangement  of  synaptic  inputs  to  neurons, 
as  opposed  to  the  1-D  or  lineal  arrangement  described 
here.  Other  learning  algorithms  calling  for  a  multilay¬ 
ered  architecture  such  as  the  error  backprojection  al¬ 
gorithm30  and  its  coherent  optics  implementation45 
can  also  now  be  envisioned  optoelectronically  employ¬ 
ing  the  partitioning  scheme  described  here. 

Learning  algorithms  in  layered  nets  lead  to  analog  or 
multivalued  T,y.  Therefore  high-speed  computer- 
controlled  SLMs  with  graded  pixel  response  are  called 
for.  Methods  of  reducing  the  needed  dynamic  range 
of  Ty  or  for  allowing  the  use  of  ternary  TtJ  are  however 
under  study  to  enable  the  use  of  commercially  avail¬ 
able  fast  nonvolatile  binary  SLM  devices  such  as  the 
Litton/Semetex  magnetooptic  SLM  (MOSLM).46  A 
frame  switching  time  better  than  1/1000  s  has  been 
demonstrated  recently  in  our  work  on  a  48  X  48  pixel 
device  by  employing  an  external  magnetic  field  bias. 
It  is  worth  noting  that  the  role  of  optics  in  the  architec¬ 
ture  described  not  only  facilitates  partitioning  the  net 
into  groups  or  layers  but  also  provides  the  massive 
interconnectivity  mentioned  earlier.  For  example,  for 
a  neural  net  with  a  total  of  N  m  512  neurons,  the  optics 
enable  making  2N 2  *  2.62  X  10s  programmable  weight¬ 
ed  interconnections  among  the  neurons  in  addition  to 
the  4 N  =*  2048  interconnections  that  would  be  needed 
in  Fig.  6(b)  to  compute  the  energy  E. 

Assuming  that  material  and  device  requirements  of 
the  architectures  described  can  be  met  and  partitioned 
self-organizing  neural  net  modules  will  be  routinely 
constructed,  the  addition  of  such  a  module  to  a  com¬ 
puter  controller  through  a  high  speed  interface  can  be 
viewed  as  providing  the  computer  controller  with  arti¬ 
ficial  intelligence  capabilities  by  imparting  to  it  neural 
net  attributes.  These  capabilities  include  self-organi¬ 
zation,  self-programmability  and  learning,  and  asso¬ 
ciative  memory  capability  for  conducting  nearest- 
neighbor  searches.  Such  attributes  would  enable  a 
small  computer  to  perform  powerful  computational 
tasks  of  the  kind  needed  in  pattern  recognition,  and  in 
the  solution  of  combinatorial  optimization  problems 
and  ill-posed  problems  encountered,  for  example,  in 
inverse  scattering  and  vision,  which  are  confined  at 
present  to  the  domain  of  supercomputers. 

A  central  issue  in  serial  digital  computation  of  com¬ 
plex  problems  is  computational  complexity.47  Pro¬ 
gramming  a  serial  computer  to  perform  a  complex 
computational  task  is  relatively  easy.  The  computa¬ 
tion  time  however  for  certain  problems,  especially 
those  dealing  with  combinatorial  searches  and  combi¬ 
natorial  optimization,  can  be  extensive.  In  neural  nets 
the  opposite  is  true.  They  take  time  to  program  [for 
example,  computation  of  the  interconnectivity  matrix 
of  synaptic  weights  by  outer  product  or  correlation 
(Hebbian  rule)  and  setting  the  weights  accordingly). 
Once  programmed,  however,  they  perform  the  compu¬ 
tations  required  almost  instantaneously.  This  fact  is 
one  of  the  first  attributes  noted  when  working  with 
neural  nets  and  has  recently  been  elaborated  on.46 

Self-organization  and  learning  entails  the  net  deter- 
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mining  by  itself  the  weights  of  synaptic  interconnec¬ 
tions  among  its  neurons  that  represent  the  association 
it  is  supposed  to  learn.  In  other  words,  the  net  pro¬ 
grams  itself,  thereby  alleviating  the  programming 
complexity  issue.  One  can  envision  nets  that  learn  by 
example  when  the  associations  the  net  is  supposed  to 
learn  are  presented  to  it  by  an  external  teacher  in  a 
supervised  learning  mode.  This  leads  naturally  to  the 
more  intriguing  question  of  unsupervised  learning  in 
such  nets  and  analog  implementations  of  such  learn¬ 
ing. 
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ferent  steering  angles,  but  an  optimal  weight  distribution  can  be 
computed  for  each  steering  angle.  From  the  simulation  result,  it 
is  seen  that  the  optimum  far-field  pattern  has  similar  teatures  to 
the  pattern  given  bv  the  Doiph-Chebvshev  distribution  function. 
In  the  Oolph-Chebvshev  pattern,  all  the  sidelobes  have  the  same 
level  for  a  specified  beamwidth.  A  numerical  example  in  (1)  shows 
an  8-element  array  lelement  separation  d  =  0.5A)  with  25.8idB) 
sidelobe  level  and  40.8°  beamwidth.  The  optimum  pattern  given 
by  our  simulation  shows  nearly  equal  level  sidelobes  which  are 
minimized  for  the  given  beamwidth. 


[8]  H.  Barrett  et a/..  Optical  Boltzmann  mai  runes  post-deadline 
paper  OSA  Topical  Meeting  on  Optical  Computing,  incline 

Village.  MV,  1985. 


V.  Discussion 

Simulated  annealing  is  a  modification  of  the  iterative  improve¬ 
ment  algorithm  (4).  It  is  physically  more  meaningful  and  can  be 
computed  more  systematically  than  the  iterative  improvement  (4|. 
Physically,  the  simulated  annealing  process  is  analogous  to  the 
cooling  of  melt  in  crystal  growth:  careful  annealing  produces  a 
defect-free  crystal,  rapid  annealing  produces  a  defective  crystal  or 
glass  [3].  The  probabilistic  treatment  with  the  probability  function 
P(A£)  =  exp  ( -AE/KT)  provides  a  way  to  accept  the  unfavorable 
changes  and  is  easy  to  compute.  From  our  simulation,  it  has  been 
found  that  the  simulated  annealing  algorithm  seems  always  to  give 
better  performance  than  the  iterative  improvement  algorithm. 

Since  simulated  annealing  is  a  modified  iterative  improvement 
process,  it  takes  a  relatively  long  time  to  do  an  optimization  prob¬ 
lem  just  as  iterative  improvement  does  in  a  computer  calculation. 
The  phased-array  synthesis  in  our  simulation  runs  for  1  h  or  so  for 
an  array  of  41  elements  on  a  MICRO  PDP-11  computer  Finding  an 
efficient  scheme  to  reduce  the  excessive  amounts  of  computer  time 
for  most  optimization  problems  has  always  been  of  concern  ] 5}- 
[7],  Otherwise,  if  enough  computation  power  is  available,  iterative 
improvement  can  be  run  from  random  starts  tor  many  times  to 
approach  the  optimum  state.  Fast  optodigital  computing  schemes 
similar  to  those  described  in  [8]  may  also  be  considered  tor  phased- 
array  synthesis  bv  simulated  annealing.  It  is  understood  that  (he 
far  field  is  the  Fourier  transform  of  the  arrav  distribution  function 
An  optical  lens  can  be  used  for  computing  the  Fourier  transform 
as  the  distribution  function  is  inputted  to  the  tront  tocal  plane  ot 
the  lens  via,  for  “xample,  an  appropriate  computer-driven  spatial 
light  moduiator  (SLM).  The  Fourier  transform  in  the  back  focal  plane 
can  oe  recorded  and  ted  to  the  computer-controller  to  make  the 
simulated  annealing  decision.  The  outcome  is  tedback  to  the  Si  M 
to  change  the  distribution  function  in  the  tront  tocal  plane.  The 
hvbrid  optodigital  scheme  will  do  the  Fourier  transform  mstantlv 
In  this  fashion,  the  computation  associated  with  the  Fourier  trans¬ 
form  can  be  virtually  eliminated  assuming  a  high-speed  SIM  and 
computer  interface  are  utilized.  An  optoelectronic  Boltzmann 
machine  tor  accelerating  the  selection  rule  has  also  been  proposed 
earlier  m  [8],  This  process  can  be  repeated  tor  each  step  in  sim¬ 
ulated  annealing.  Also,  a  Cauchy  probability  selection  rule  instead 
ot  the  Boltzmann  selection  rule,  can  be  used  to  speed  up  the  whole 
annealing  process  further  (7], 
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Abstract 

Self-organization  and  learning  is  a  distinctive  feature  of  neural  nets 
and  processors  that  sets  them  apart  from  conventional  approaches  to  signal 
processing.  It  leads  to  self-programmability  which  alleviates  the  problem 
of  programming  complexity  in  artificial  neural  nets.  We  have  devised 
architectures  for  partitioning  an  opto-electronic  analog  of  a  neural  net 
into  distinct  layers  with  prescribed  inter-connectivity  pattern  to  enable 
stochastic  learning  by  simulated  annealing  in  the  context  of  a  Boltzmann 
machine.  Stochastic  learning  is  of  interest  because  of  its  relevance  to 
the  role  of  noise  in  biological  neural  nets.  It  can  shed  light  on  the  way 
nature  has  turned  noise  present  in  biological  nets  to  work  to  its  advantage. 
Practical  considerations  and  methodologies  for  appreciably  accelerating 
stochastic  learning  in  such  a  multi-layered  net  are  also  described.  These 
include  the  use  of  parallel  optical  computation  of  the  energy  of  the  net, 
the  use  of  fast  nonvolatile  programmable  spatial  light  modulators  to  realize 
fast  "plasticity",  optical  generation  of  random  number  arrays,  and  a  noisy 
thresholding  scheme  that  makes  stochastic  learning  more  biologically  plaus¬ 
ible  and  does  not  require  determining  the  energy  of  the  net  for  the 
annealing  schedule. 

1.  INTRODUCTION 

Interest  in  neural  network  models  (see  for  example,  [ 1] — [9] )  and  their 
optical  analogs  (see  for  example  [  10]  —  [ 21 ] )  stems  from  well  recognized 
information  processing  capabilities  of  the  brain  and  the  fit  between  what 
optics  can  do  and  what  even  simplified  models  of  neural  nets  can  offer 
toward  the  development  of  new  approaches  to  collective  signal  processing 
that  are  robust,  fault  tolerant  and  can  be  extremely  fast. 

As  a  result  opto-electronic  analogs  and  implementations  of  neural  nets 
are  attracting  today  considerable  attention.  The  optics  in  these  imple¬ 
mentations  provide  the  needed  parallelism  and  massive  interconnectivity  and 
therefore  a  potential  for  realizing  relatively  large  neural  nets  while  the 
decision  making  elements  are  realized  electronically  heralding  a  possible 
ultimate  marriage  of  VLSI  and  optics. 

Architectures  suitable  for  use  in  the  implementation  of  opto-electronic 
neural  nets  of  one-dimensional  and  two-dimensional  arrangements  of  neural 
nets  have  been  studied  and  described  recently  [11] — [ 14] .  Two-dimensional 
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architectures  for  opto-electronic  analogs  of  neural  nets  have  been  success¬ 
fully  used  in  the  recognition  of  objects  from  partial  information  by  either 
complementing  the  missing  information  or  by  automatically  generating  correct 
labels  of  the  data  (object  feature  spaces)  the  memory  is  presented  with  [22] . 

In  associative  memory  applications,  the  strength  of  interconnection 
between  the  "neurons"  of  the  net  is  determined  by  the  entities  one  wishes 
to  store  in  the  net.  Specific  storage  "recipes"  based  on  a  Hebbian  model 
of  learning  (outer-product  storage  algorithm)  or  variations  thereof  are 
usually  employed.  In  that  sense  the  memory  is  taught  what  is  should  know 
and  be  cognizant  of.  What  can  be  of  great  utility  however,  is  that  neural 
nets  can  also  be  made  to  be  self-organizing  and  learning  i.e.,  to  become 
self-programming.  The  combination  of  neural  nets,  Boltzmann  machines,  and 
simulated  annealing  concepts  with  high  speed  opto-electronic  implementations 
promise  to  produce  high-speed  artificial  neural  net  processors  with  stochas¬ 
tic  rather  than  deterministic  rules  for  decision  making  and  state  update 
that  can  form  their  own  internal  representations  (connectivity  weights)  of 
their  environment,  the  outside  world  data  they  are  presented  with,  in  a 
manner  very  analogous  to  the  way  the  brain  forms  its  own  symbolic  repre¬ 
sentations  of  reality.  This  is  quite  intriguing  and  has  far  reaching 
implications  for  smart  sensing  and  recognition,  thinking  machines,  and 
artificial  intelligence  as  a  whole.  Our  exploratory  work  is  showing  that 
optics  can  also  play  a  role  in  the  implementation  and  speeding  up  of 
learning  algorithm  (such  as  simulated  annealing  in  the  context  of  a 
Boltzmann  machine  formalism  [ 23] — [26]  and  error  back  propagation  [27])  in 
such  self-teaching  nets  and  for  their  subsequent  use  in  automated  robust 
recognition  of  entities  the  nets  have  had  a  chance  to  learn  earlier  by 
repeated  exposure  to  them  when  in  a  learning  mode.  Self-organization  and 
learning  seems  to  be  what  sets  apart  optical  and  opto-electronic 
architectures  and  processing  based  on  models  of  neural  nets  from  other 
conventional  approaches  to  optical  processing. 

In  this  paper  we  are  therefore  first  concerned  with  architectures  for 
opto-electronic  implementation  of  neural  nets  that  are  able  to  program  or 
organize  themselves  under  supervised  conditions,  i.e.,  of  nets  that  are 
capable  of  (a)  computing  the  interconnectivity  matrix  for  the  associations 
they  are  to  learn,  and  (b)  of  changing  the  weights  of  the  links  between 
their  neurons  accordingly.  Such  self-organizing  networks  have  therefore 
the  ability  to  form  and  store  their  own  internal  representations  of  the 
entities 

or  associations  they  are  presented  with.  We  are  also  concerned  with 
stochastic  learning  in  such  nets  and  with  methodologies  for  accelerating 
the  learning  process.  These  include  a  novel  noisy  threshold  scheme  that 
can  speed  up  the  simulated  annealing  process  in  opto-electronic  analogs  of 
neural  nets.  Results  of  computer  simulations  demonstrating  capabilities  of 
annealing  with  the  noisy  thresholding  are  presented. 

2.  PARTITIONING  ARCHITECTURES  AND  STOCHASTIC  LEARNING 

Multi-layered  self-programming  nets  have  been  described  recently,  [25], 
[26],  where  the  net  is  partitioned  into  three  groups.  Two  are  groups  of 
visible  or  external  input/output  units  or  neurons  that  interface  with  the 
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net  environment  or  surroundings.  The  third  is  a  group  of  hidden  or  internal 
units  that  separates  the  input  and  output  units  and  participates  in  the 
process  of  forming  internal  representations  of  the  associations  the  net  is 
presented  with,  as  for  example  by  "clamping”  or  fixing  the  states  of  the 
input  and  output  neurons  to  the  desired  associations  and  letting  the  net 
run  through  its  learning  algorithm  to  arrive  ultimately  at  a  specific  set 
of  synaptic  weights  or  links  between  the  neurons  that  capture,  after  many 
iterations  of  the  process,  the  underlying  structure  of  all  the  associations 
presented  to  the  net.  The  hidden  units  or  neurons  prevent  the  input  and 
output  units  from  communicating  with  each  other  directly.  In  other  words 
no  neuron  or  unit  in  the  input  group  is  linked  directly  to  a  neuron  in  the 
output  group  and  vice-versa.  Any  such  communication  must  be  carried  out 
via  the  hidden  units.  Neurons  within  the  input  group  do  not  communicate 
with  each  other.  They  can  only  communicate  with  neurons  in  the  input  and 
output  groups  as  stated  earlier. 

As  an  example  of  the  continuing  role  for  optics,  we  describe  next  a 
concept  for  partitioning  an  opto-electronic  analog  of  a  neural  net  into 
input,  output,  and  internal  units  with  the  selective  communication  pattern 
described  above  in  order  to  realize  a  multi-layered  net  analog  capable  of 
stochastic  learning,  by  means  of  a  simulated  annealing  learning  algorithm 
in  the  context  of  a  Boltzmann  machine  formalism  (see  Fig.  1(a)).  The 
arrangement  shown  derives  from  the  neural  network  analogs  we  described 
earlier  [11].  The  network,  consisting  of  say  N  neurons,  is  partitioned 
into  three  groups.  Two  groups,  and  V2,  represent  visible  or  exterior 
units  that  can  be  used  as  input  and  output  units  respectively.  The  third 
group  H  are  hidden  or  internal  units.  The  partition  is  such  that  Ni+N2+N3=N 
where  subscripts  1,  2,  3  on  N  refer  to  the  number  of  neurons  in  the  V3,  V2 
and  H  groups  respectively.  The  interconnectivity  matrix,  designated  here  as 
Wjj,  is  partitioned  into  nine  submatrices,  A,  B,  C,  D,  E,  and  F  plus  three 
zero  submatrice  shown  as  blackened  or  opaque  regions  of  the  W^j  mask.  The 
LED  array  represents  the  state  of  the  neurons,  assumed  to  be  unipolar  binary 
(LED  on  =  neurons  firing,  LED  off  =  neuron  not-firing).  The  Wjj  mask  repre¬ 
sents  the  strengths  of  interconnection  between  neurons  in  a  manner  similar 
to  earlier  arrangements  [11].  Light  from  the  LEDs  is  smeared  vertically 
over  the  Wjj  mask  with  the  aid  of  an  anamorphic  lens  system  (not  shown  in 
Fig.  1(a))  and  light  emerging  from  rows  of  the  mask  is  focused  with  the  aid 
of  another  anamorphic  lens  system  (also  not  shown)  onto  elements  of  the 
photodetector  (PD)  array.  Also  we  assume  the  same  scheme  utilized  in  [11] 
for  realizing  bipolar  values  of  in  incoherent  light  is  adopted  here, 
namely  by  separating  each  row  of  the  Wjj  mask  into  two  subrows  and  assigning 
positive  values  of  Wfj  to  one  subrow  and  negative  values  Wjj  to  the  other, 
then  focusing  light  emerging  from  the  two  subrows  separately  onto  pairs  of 
adjacent  photosites  connected  in  opposition  in  each  of  the  Vj_,  V2  and  H 
segments  of  the  photodetector  array.  Submatrix  A, with  N^xN^  elements,  pro¬ 
vides  the  interconnection  weights  of  units  or  neurons  within  group  V^. 

Submatrix  B,with  N2xN2  elements,  provides  the  interconnection  weights  of 
units  within  V2.  Submatrices  C  (of  K1XN3  elements)  and  D  (of  N3xN^  ele¬ 
ments)  provide  the  interconnection  weights  between  units  of  V3  and  H  and 
similarly  submatrices  E  (of  N2xN3  elements)  and  F  (of  N3xn2)  provide  the 


interconnection  weights  of  units  V2  and  H.  Units  in  and  V£  can  not 
communicate  with  each  other  directly  because  locations  of  their  interconnec¬ 
tivity  weights  in  the  Wjj  matrix  or  mask  are  blocked  out  (blackened  lower 
left  and  top  right  portion  of  Wjj).  Similarly  units  within  H  do  not  commu¬ 
nicate  with  each  other  because  locations  of  their  interconnectivity  weights 
in  the  Wjj  mask  are  also  blocked  out  (center  blackened  square  of  Wjj).  The 
LED  element  0  can  be  of  graded  response.  It  can  be  viewed  as  representing 
the  state  of  an  auxiliary  neuron  in  the  net  that  is  always  on  to  provide  a 
threshold  level  to  all  units  by  contributing  to  the  light  focused  onto  only 
negative  photosites  of  the  PD  arrays  by  suitable  modulation  of  pixels  in 
the  G  column  of  the  interconnectivity  mask.  This  method  for  introducing 
the  threshold  level  is  attractive  as  it  allows  for  introducing  a  fixed 
threshold  (fixed  0-LED  output)  to  all  neurons  or  an  adaptive  threshold  if 
desired.  The  threshold  is  global  when  the  transmittances  of  pixels  in  G  are 
fixed  and  the  0  LED  level  in  controlled.  The  threshold  is  local  if  the 
0  LED  output  is  fixed  and  the  pixel  transmittances  are  allowed  to  vary. 


INTERCONNEC- 
rivirv  mask 


Fig  1.  Architecture  for  opto-electronic  analog  of  layered  self-programming 
net.  (a)  partitioning  concept  and,  (b)  arrangement  for  rapid 
determination  of  the  net's  global  energy  E  for  use  in  learning 
by  simulated  annealing. 

We  have  described  elsewhere  in  some  detail  [28] ,  how  by  using  a  computer 
controlled  nonvolatile  spatial  light  modulator  to  implement  the  Wjj  mask 
in  Fig.  1(a)  and  including  a  computer/controller  as  shown  and  by  repeated 
application  of  the  simulated  annealing  procedure  with  Boltzmann  or  other 
stochastic  state  update  rule  and  collection  of  co-occurance  statistics  on 
the  states  of  the  neurons  at  the  end  of  each  run  when  the  net  reached 
thermodynamic  equilibrium,  the  scheme  can  be  made  self-programming  with 
ability  to  modify  the  weights  of  synaptic  links  between  its  neurons  to  form 
internal  representations  of  the  input/output  associations  or  patterns 
presented  to  it. 
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3 .  ACCELERATED  LEARNING 


The  stochastic  learning  by  simulated  annealing  procedure  was  originally 
conceived  for  serial  computation.  When  dealing  with  parallel  optical 
computing  systems  of  the  kind  we  described,  it  does  not  make  sense  to 
strictly  adhere  to  a  serial  algorithm.  Modifications  that  can  take 
advantage  of  the  available  parallelism  of  optics  to  speed  up  stochastic 
learning  should  be  considered.  In  this  section  we  offer  several  such 
modifications  that  can  markedly  speed-up  stochastic  learning  in 
opto-electronic  implementations  as  compared  to  serial  digital 
implementation. 

Learning  by  simulated  annealing  requires  calculating  the  energy  E,  of 
the  net  [7] , [29] , 


1  * 

-  -  2  ?  Vi 


where  s^  is  the  state  of  the  i-th  neuron  and 

“i  *  J,  “i/j  -  V1! 

is  the  activation  potential  of  the  i-th  neuron,  with  iand  Ij  being 
respectively  the  threshold  level  and  external  input  of  the  i-th  neuron. 
Equation  (2)  can  be  written  in  the  form, 

“i  *  J,  Vi 


by  absorbing  9j  and  I*  in  the  weight  matrix  Wjj.  This  can  be  done  by  adding 
a  G  column  to  the  Wjj  matrix  to  furnish  9*  as  described  earlier.  A  similar 
procedure  can  be  used  to  furnish  Ij  by  adding  another  column  with  transmit- 
tances  proportional  to  Ij  whose  light  transmittance  is  focused  onto  the 
positive  photosites  of  the  photodetector  array  in  Fig.  1(a).  The  above 
method  of  introducing  9^  and  I^  suggests  also  that  random  (noise)  components 
of  both  9j  and  1^  can  be  introduced  by  focusing  a  random  array  of  light 
spots,  whose  intensities  are  allowed  to  vary  randomly  and  independently 
with  time,  directly  onto  the  positive  and  negative  photosites  of  the  PD 

array  of  Fig.  1(a).  In  this  fashion  deterministic  and  random  composition 

of  9^  and  1^  can  be  realized.  Taken  together,  the  random  components 

of  9^  and  1^  can  be  viewed  as  random  bipolar  noisy  threshold.  We 

will  return  to  this  point  later  in  our  discussion  of  annealing  with  noisy 
threshold . 

A  simplified  version  of  a  rapid  scheme  for  obtaining  E  opto-elec- 
tronically  is  shown  in  Fig.  1(b).  The  scheme  requires  the  use  of  an 
electronically  addressed  nonvolatile  binary  (on-off)  spatial  light  modu¬ 
lator  consisting  of  a  single  column  of  N  pixels.  A  suitable  candidate  is  a 
parallel  addressed  magneto-optic  spatial  light  modulator  (MOSLM)  [30],  in 
particular  one  consisting  of  a  single  column  of  N  pixels  that  are  driven 
electronically  by  the  same  signal  driving  the  LED  array  in  order  to 
represent  the  state  vector  5  of  the  net.  A  fraction  of  the  focused  light 
emerging  from  each  row  of  the  Wjj  mask  is  deflected  by  the  beam  splitter  BS 
onto  the  individual  pixels  of  the  column  MOSLM  such  that  light  from  adjacent 
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pairs  of  subrows  of  Wjj  fall  on  one  pixel  of  the  MOSLM.  The  MOSLM  pixels 
are  overlayed  by  a  checkered  binary  mask  as  shown.  The  opaque  and  trans¬ 
parent  pixels  in  the  checkered  mask  are  staggered  in  such  a  fashion  that 
light  emerging  from  the  left  subcolumn  will  originate  from  the  positive 
subrows  W|j  of  Wjj  only  and  light  emerging  from  the  right  subcolumn  will 
originate  from  the  negative  subrows  W^j  of  W^j.  By  separately  focusing  the 
light  from  the  left  land  right  subcolumns  as  shown  onto  two  photodetectors 
and  subtracting  and  halving  their  outputs  one  obtains, 


E  -  -  z  l  1(  *1)  -  Vj1  S1  -  -  l  l  "uVi  Vi 


(A) 


which  is  the  required  global  energy. 

Stochastic  learning  by  simulated  annealing  in  the  opto-electronic  neural 
net  analogs  of  Fig.  1  requires,  as  detailed  elsewhere  [28],  fast  random 
number  generation  for  use  in  random  drawing  and  switching  of  state  of 
neurons  from  H  and  from  H  and  V2.  Another  random  number  is  also  needed 
to  execute  the  stochastic  state  update  rule.  Although  fast  digital 
pseudo-random  number  generation  of  up  to  10^  [sec--*-]  is  feasible  [31] 
and  can  be  used  to  help  speed  up  digital  simulation  of  the  learning 
algorithm,  this  by  itself  is  not  sufficient  to  make  a  large  impact 
specially  when  the  total  number  of  neurons  in  the  net  is  large. 
Opto-electronic  random  number  generation  is  also  possible  although  at  a 
slower  rate  of  about  10^  [sec] .  Despite  the  slower  rate  of  generation, 
opto-electronic  methods  have  advantages  that  will  be  elaborated  upon 
below.  An  opto-electronic  method  for  generating  the  Boltzmann  probability 
factor  needed  in  the  simulated  annealing  algorithm  [28]  employing  speckle 
statistics  is  described  in  [32]  and  optical  generation  of  random  number 
arrays  by  photon  counting  image  acquisition  systems  or  clipped  laser 
speckle  have  also  been  recently  described  [33]-[36].  These  photon  counting 
image  acquisition  systems  have  the  advantage  of  being  able  to  generate 
normalized  random  numbers  with  any  probability  density  function.  A  more 
important  advantage  of  optical  generation  of  random  number  arrays  however 
is  the  ability  to  exploit  the  parallelism  of  optics  to  modify  the  simulated 
annealing  and  the  Boltzmann  machine  formalism  detailed  above  in  order  to 
achieve  significant  improvement  in  speed.  As  stated  earlier,  with  parallel 
optical  random  number  generation,  a  spatially  and  temporally  uncorrelated 
linear  array  of  perculating  light  spots  of  suitable  size  can  be  generated 
and  imaged  onto  the  photodetector  array  (PDA)  of  Fig.  1  directly  such  that 
both  the  positive  and  negative  photosites  of  the  PDA  are  subjected 
to  random  irradiance.  This  introduces  a  random  (noise)  component  in  9j  and 
If  of  eq.  (2)  which  can  be  viewed  as  stated  earlier  as  bipolar  noisy 
threshold.  The  noisy  threshold  produces  in  turn  a  noisy  component  in  the 
energy  in  accordance  to  eq.  (2).  The  magnitude  of  the  noise  components  can 
be  controlled  by  varying  the  standard  deviation  of  the  random  light 
intensity  array  irradiating  the  PDA.  The  noisy  threshold  produces 
therefore  random  controlled  perturbation  or  "shaking"  of  the  energy 
landscape  of  the  net.  This  helps  shake  the  net  loose  whenever  it  gets 
trapped  in  a  local  energy  minimum.  The  procedure  can  be  viewed  as 
generating  a  controlled  deformations  or  tremor  in  the  energy  landscape  of 
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the  net  to  prevent  entrapment  in  a  local  energy  minimum  and  ensure  thereby 
convergence  to  a  state  of  global  energy  minimum.  Both  the  random  drawing 
of  neurons  (more  than  one  at  a  time  is  now  possible)  and  the  stochastic 
state  update  of  the  net  are  now  done  in  parallel  at  the  same  time.  This 
leads  to  significant  acceleration  of  the  simulated  annealing  process.  In 
the  following,  results  of  numerical  simulations  aimed  at  gaining  insight  in 
the  performance  of  the  noisy  threshold  scheme  are  presented. 

4.  SIMULATION  RESULTS 

For  the  purposes  of  this  simulation  we  form  a  fully  interconnected 
(single  layer)  neural  net  with  random  bipolar  binary  weights  matrix  with 
diagonal  elements  set  to  zero.  The  number  of  neurons  N  is  16.  The  weights 
are  symmetrical.  Figure  2  shows  the  density  of  states  (energy  histogram) 
of  the  net,  where  we  calculate  the  energies  of  all  the  possible 
(2^6)  configurations  of  the  net.  The  Y  (vertical)  axis  shows  the  number 
of  configurations  (out  of  2^)  with  the  same  energy  and  their 
corresponding  energy  energy  represented  by  the  X  (horizontal)  axis.  The 
low  energy  configurations  correspond  to  states  near  the  very  left  of  the 
curve,  of  which  a  good  annealing  scheme  should  find  one.  To  avoid  lengthy 
simulation  time,  we  do  not  in  the  following  exhaust  the  simulation  for  all 
possible  configurations  (2^).  Instead,  we  randomly  select  50 
configurations  as  the  test  sample  space.  The  energy  histogram  of  these  50 
configurations  is  shown  in  Fig.  3.  In  Fig.  4  is  shown  the  energy  histogram 
of  the  states  to  which  the  net  converges  when  initiated  with  the  50 
configurations  of  the  test  space.  The  histogram  was  obtained  by  initiating 
the  net  with  any  one  of  the  50  states  and  followed  by  finding  the  final 
state  to  which  net  net  converges  by  iteratively  applying  the  customary 
neural  net  state  update  rule  [7]  namely, 

(  1  if  u,  >  0 

3i  = 

|0  if  UjiO 

which  amounts  to  performing  a  steepest  gradient  descent  search  into  local 
minima  of  the  net,  then  calculating  the  energy  of  the  final  state.  The 
plot  means  that  there  are  Y  number  of  initial  states  (out  of  the  50  config¬ 
urations  of  the  sample  space)  which  converge  (in  the  sense  of  the  above 
conventional  steepest  descent)  to  local  minima  with  same  energy  X.  We  see 
that  a  fair  number  of  initial  states  end  up  trapped  in  local  energy  minima 
at  high  energy  state  because  the  steepest  gradient  descent  search  method 
involved  is  deterministic  and  does  not  have  provisions  for  escaping  from  a 
local  minimum.  This  curve  can  serve  as  a  reference  to  test  the  performance 
of  an  annealing  scheme's  ability  to  escape  from  a  local  minimum  and  find 
the  global  minimum.  In  Figs  5  and  6,  we  display  the  energy  histograms  of 
the  convergent  states  when  different  annealing  schemes  were  used.  In  these 
figures,  each  of  the  50  configurations  of  test  space  ir  used  as  input 
vector  for  100  times,  and  the  statistics  of  the  convergent  states  are 
collected.  The  results  employing  the  simulated  annealing  algorithm 
[23], [24]  with  random  drawing  of  neurons  one  neuron  at  a  time,  and 
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stochastic  update  employing  noise  uniformly  distributed  in  the  range  (-T,T) 
are  shown  in  Fig.  5.  The  annealing  or  cooling  schedule  used  was:  96  8  10, 
160  e  5,  96  e  3,  96  8  i,  96  8  .5,  and  96  e  0.01  where  I  8  T  specifies  the 
number  of  iterations  I  at  temperature  T.  Figure  6  shows  the  results 
obtained  with  the  noisy  threshold  scheme  where  the  deterministic  component 
of  the  threshold  is  taken  to  be  zero  and  independent  bipolar  noise 
components  uniformly  distributed  in  the  (-T,T)  range  are  added  to  the 
thresholds  every  iteration.  The  probability  of  the  i-th  neuron  switching 
its  state  was  taken  to  be  inversely  proportional  to  its  activation 
potential  Uj.  The  noise  amplitude  T  was  reduced  gradually  every 
specified  number  of  iterations  to  allow  the  net  to  find  the  states  of 
global  energy  minimum  or  one  close  to  it.  The  following  annealing  schedule 
was  utilized:  10  8  2.5,  10  8  1.5,  10  e  .5,  and  10  8  .1.  It  is  seen  that 
annealing  with  noisy  threshold  finds  states  of  global  energy  minima  equally 
well  as  the  conventional  simulated  annealing  scheme.  The  number  of 
iterations  involved  is  however  considerably  less:  40  as  compared  to  540  in 
the  conventional  simulated  annealing  scheme.  It  is  worth  noting  also  that 
the  noisy  threshold  scheme  does  not  require  knowing  the  energy  of  the  net 
to  apply  the  rule  described  in  the  preceeding  section.  This  further 
accelerates  the  search  for  the  global  minimum  and  can  markedly  shorten 
learning  time  [28] . 


Fig.  2.  Density  of  states  for  2^6 
possible  configurations. 


Fig.  3.  Density  of  states  of  the 
sample  space  (of  50 
randomly  selected 
configurations) . 


Fig.  4.  Density  of  convergent 
states  in  sample  space 
(conventional  neural  net 
update  rule). 
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Fig.  5.  Density  of  convergent 
states  in  sample  space 
(conventional  simulated 
annealing  with  noise 
uniformly  distributed  in 
C-T.T)). 


Fig.  6.  Density  of  convergent 
states  in  sample  space 
(noisy  threshold  scheme 
using  noise  uniformly 
distributed  in  (-T,T). 


5.  DISCUSSION 

We  have  described  an  architecture  for  partitioning  an  opto-electronic 
analog  of  a  neural  net  to  form  a  multilayered  net  that  permits 
self-organization  and  learning  when  computer  controlled  nonvolatile  spatial 
light  modulators  are  utilized  to  realize  the  required  plasticity.  The 
focus  here  is  on  stochastic  learning  as  opposed  to  deterministic  learning 
because  it  can  account  for  the  role  of  noise  in  biological  neural  nets.  We 
also  described  opto-electronic  architectures  that  can  be  used  for  fast 
determination  of  the  energy  of  the  net  and  therefore  can  accelerate  the 
simulated  annealing  process  involved  in  stochastic  learning  where  "optical 
random  arrays”  can  also  be  used  to  accellerate  the  process  further. 

However,  when  parallel  optical  computing  is  employed,  it  is  not  necessary 
to  adhere  to  a  serial  simulated  annealing  algorithm.  We  have  shown  that 
departure  from  the  conventional  simulated  annealing  algorithm  through  the 
use  of  a  noisy  thresholding  scheme  promises  to  markedly  accelerate 
stochastic  learning  in  opto-electronic  implementation  of  multilayered 
neural  nets,  make  the  procedure  more  biologically  plausible,  and  make 
stochastic  learning  practical. 
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Abstract 

Self-organization  and  learning  is  an  important  attribute  of  neural  nets  that  sets  them 
apart  from  other  approaches  to  signal  processing.  The  study  of  stochastic  learning  by 
simulated  annealing  in  the  context  of  a  Boltzmann  machine  is  attractive  because  it 
could  shed  light  on  the  role  of  noise  in  biological  neural  nets  and  because  it  can  lead 
to  artificial  neural  nets  that  can  be  switched  between  two  distinct  operating  modes 
depending  on  noise  level  in  the  network.  At  finite  noise  level  (or  temperature)  the  net 
can  be  operated  in  a  '‘soft'’  mode  where  learning  can  take  place  by  automated  synaptic 
modifications.  Once  learning  is  completed  the  net  is  ‘‘hardened'’  (or  frozen)  and  acts 
as  associative  memory  by  reducing  the  noise  level  or  temperature  to  zero.  We  present 
the  results  of  numerical  and  experimental  study  aimed  at  opto-electronic  realization 
of  such  networks.  The  results  include:  (a)  fast  annealing  by  noisy  thresholding  which 
demonstrates  that  the  global  energy  minimum  of  a  small  analog  test  network  can  be 
reached  in  a  matter  of  a  few  tens  of  neuron  time  constants,  (b)  stochastic  learning  with 
binary  weights  which  paves  the  way  for  the  use  of  fast  binary  and  nonvolatile  spatial 
light  modulators  to  realize  synaptic  modifications. 

1  System  Architecture 

Optics  and  opto-electronic  architectures  and  techniques  can  play  an  important  role  in 
the  study  and  implementation  ot  self- programming  networks  and  in  speeding-up  the 
execution  of  learning  algorithms.  Learning  requires  partitioning  a  net  into  layers  with 
a  prescribed  communication  pattern  among  them.  A  method  for  partitioning  an  opto¬ 
electronic  analog  of  a  neural  net  into  input,  output,  and  internal  groups  (layers)  of 
neurons  with  selective  communication  pattern  among  neurons  within  each  layer  and 
between  layers  that  is  capable  of  stochastic  learning,  by  means  of  a  simulated  annealing 
algorithm  in  the  context  of  a  Boltzmann  machine  formalism  is  described  in  Fig.  1(a) 
The  network,  consisting  of  N  neurons,  is  partitioned  into  three  groups.  Two  groups,  V\ 
and  V2,  represent  visible  or  environmental  units  that  can  be  used  as  input  and  output 
units  respectively.  The  third  group  H  are  hidden  units.  The  partition  is  such  that 
N\  +  N?  1-  N3  =  y  where  subscripts  1.2,  and  3  on  .V  refer  to  the  number  of  neurons 
in  the  V\,  U2  and  H  groups  respectively.  The  interconnectivity  matrix,  designated 
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here  as  Wij ,  is  partitioned  into  nine  submatrices,  A,  B,  C,  D.  E.  and  F  plus  three 
zero  submatrices  shown  as  blackened  or  opaque  regions  of  the  u/.y  mask.  The  LED 
array  represents  the  state  of  the  neurons,  assumed  to  be  unipolar  binary  (LED  on  = 
neurons  firing,  LED  off  =  neurons  not-firing).  The  wt]  mask  represents  the  strengths 
of  the  interconnection  between  neurons.  Light  from  the  LEDs  is  smeared  vertically 
over  the  W{j  mask  with  the  aid  of  an  anamorphic  lens  system  (not  shown  in  Fig. 
1(a))  and  light  emerging  from  rows  of  the  mask  is  focused  with  the  aid  of  another 
anamorphic  lens  system  (also  not  shown)  onto  elements  of  the  photodetector  (PD) 
array.  Bipolar  values  of  inty  can  be  realized  in  incoherent  light  by  separating  each  row 
of  the  Wij  mask  into  two  subrows  and  assigning  positive  values  of  to  one  subrow  and 
negative  values  w~  to  the  other,  then  focusing  light  emerging  from  the  two  subrows 
separately  onto  pairs  of  adjacent  photosites  connected  in  opposition  in  each  of  the  Vj. 
Vj  and  H  segments  of  the  PD  array  as  described  elsewhere  [2].  Submatrix  A.  with 
N\xNi  elements,  provides  the  interconnection  weights  of  units  or  neurons  within  group 
Vj.  Submatrix  B,  with  jV2xJV2  elements,  provides  the  interconnection  weights  of  units 
within  Vj.  Submatrices  C  (of  Ajx*V3  elements)  and  D  (of  .V3xJVj  elements)  provide 
the  interconnection  weights  between  units  of  Vj  and  H  and  similarly  submatrices  E 
(of  .V2X.V3  elements)  and  F  (of  .V3x.V2  elements)  provide  the  interconnection  weights 
of  units  Vj  and  H.  Units  in  Vj  and  Vj  can  not  communicate  with  each  other  directly 
because  locations  of  their  interconnectivity  weights  in  the  t nty  matrix  or  mask  are 
blocked  out  (blackened  lower  left  and  top  right  portion  of  u?,;).  Similarly  units  within 
H  do  not  communicate  with  each  other  because  locations  of  their  interconnectivity 
weights  in  the  u?,y  mask  are  also  blocked  out  (center  blackened  square  of  wtJ).  The 
LED  element  t)  is  of  graded  response.  It  can  be  viewed  as  representing  the  state  of  an 
auxiliary  neuron  in  the  net  that  is  always  on  to  provide  a  threshold  level  to  ail  units 
by  contributing  to  the  light  focused  onto  only  negative  photosites  of  the  PD  array  by 
suitable  modulation  of  pixels  in  the  G  column  of  the  interconnectivity  mask.  This 
method  for  introducing  the  threshold  level  is  attractive  as  it  allows  for  introducing 
a  fixed  threshold  to  ail  neurons  or  an  adaptive  threshold  if  desired.  It  can  also  be 
employed  to  alter  the  energy  landscape  of  the  net  adaptively  in  accordance  to  the 
behavior  of  other  parameters  of  the  net.  Figure  1(b)  shows  the  arrangement  for  rapid 
determination  of  the  net’s  energy  E  for  use  in  learning  by  simulated  annealing.  A 
computer  works  as  the  system  controller  to  calculates  P,;  and  P-,  and  also  to  control  the 
MOSLM  which  implements  the  interconnectivity  matrix  W.  This  architecture  allows 
stochastic  learning  by  simulated  annealing  in  the  context  of  a  Boltzmann  machine.  The 
learning  algorithm  for  Boltzmann  machine  can  be  summarized  as  follows: 

1.  Choose  one  mapping  or  associated  pair  that  the  net  is  required  to  learn,  and 
present  it  to  the  net.  The  associated  pair  consists  of  two  unipolar  binary  vectors 
one  an  input  vector  and  the  other  an  output  vector. 

2.  Clamp  the  input  vector  to  the  Vj  neurons,  and  the  corresponding  output  vector 
to  the  Vj  neurons. 

3.  Employ  simulated  annealing  method  in  energy  space  to  find  low  energy  configu¬ 
rations  at  the  given  Vj  and  Vj.  The  final  temperature  in  the  cooling  schedule  is 
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called  To  and  will  be  used  later  as  an  annealing  parameter  in  Cross-Entropy  or 
G-space.  During  this  step,  random  drawing  and  change  of  only  the  states  of  the 
hidden  neurons  (H)  takes  place. 

4.  Repeat  steps  2-3  VI  times  for  all  associations  the  net  is  required  to  learn,  and 
collect  co-occurrence  statistics  i.e.  determine  the  probabilities  P,:  of  the  :th  and 
jth  being  in  the  same  state  i.e.  both  being  on  or  off. 

5.  Unclamp  the  Vi  neurons  and  repeat  steps  3-4  for  all  input  vectors,  and  collect  co¬ 
occurrence  statistics  again  i.e.  determine  the  probabilities  PXJ  of  the  ith  and  jth 
neurons  being  in  the  same  state.  During  this  step,  random  drawing  and  change 
of  both  the  states  of  the  H  and  the  \\  neurons  takes  place. 

6.  All  weights  in  the  net  are  modified  by  increasing  the  synaptic  weight  (W,;)  be¬ 
tween  the  ith  and  jth  neurons  by  a  small  amount  6  if  Pt:  -  Px]  >  0.  otherwise, 
decreasing  the  weight  by  the  same  amount.  Note  this  requires  multivalued  Wl} 
or  incremental  variation  of  that  requires  the  use  of  graded  response  spatial 
light  modulators  for  realizing  synaptic  modifications  in  opto-electronic  implemen¬ 
tations. 

7.  We  call  steps  1-6  a  learning  cycle.  The  learning  cycle  consists  of  two  phases. 
Phase  one  involves  clamping  the  input  and  output  units  to  the  associated  pairs. 
Phase  two  involves  clamping  the  input  units  to  the  input  vector  alone  and  letting 
the  output  units  free  run  with  the  hidden  units.  The  learning  cycle  is  repeated 
again  and  again  and  is  halted  after  PX]  -  Pt.  is  close  to  zero  for  every  i  and  j. 

The  learning  procedure  described  above  can  be  supported  in  the  opto-electronic 
hardware  environment  described  previously. 

2  Fast  Annealing  With  Noisy  Threshold 

With  the  aid  of  an  optical  random  number  generation,  a  spatially  and  temporally 
uncorreiated  linear  array  of  perculating  light  spots  of  suitable  size  and  intensity  range 
can  be  generated  and  imaged  onto  the  PD  array  of  Fig.  1  directly  such  that  both  the 
positive  and  negative  photosites  of  the  PD  array  are  subjected  to  random  irradiance. 
This  introduces  a  random  (noise)  component  in  the  threshold.  The  noisy  threshold 
produces  in  turn  a  noisy  component  in  the  energy  function  of  the  net.  The  magnitude 
of  the  noise  components  can  be  controlled  by  varying  the  light  intensity  array  irradiating 
the  PD  array.  The  noisy  threshold  produces  therefore  random  controlled  perturbation 
or  “shaking”  of  the  energy  landscape  of  the  net.  This  helps  shake  the  net  loose  whenever 
it  gets  trapped  in  a  local  energy  minimum.  The  procedure  can  be  viewed  as  generating 
a  controlled  gradually  decreasing  deformations  or  tremors  in  the  energy  landscape  of 
the  net  that  prevents  entrapment  in  a  local  energy  minimum  and  helps  the  net  settle 
into  the  global  minimum  energy  state  or  one  close  to  it.  Both  the  random  drawing  of 
neurons  (more  than  one  at  a  time  is  now  possible)  and  the  stochastic  state  update  of 
the  net  are  now  done  in  parallel  at  t*'®  same  time.  This  leads  to  significant  acceleration 
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of  the  simulated  annealing  process.  Electronic  control  of  the  random  light  intensity 
enables  realizing  any  annealing  profile.  We  had  presented  the  results  of  numerical 
study  elsewhere  [1].  In  the  following,  results  of  an  experimental  study  aimed  at  gaining 
insight  in  the  performance  of  the  noisy  threshold  scheme  are  presented. 

3  Experimental  Results 

An  annealing  experiment  based  on  the  noisy  threshold  algorithm  in  an  opto-electronic 
neural  net  is  reported.  A  television  screen  tuned  to  an  empty  channel  where  no  TV 
station  operates  is  used  as  the  spatio-temporal  optical  noise  source.  We  use  a  lens 
to  project  the  optical  noise  pattern  (snow  pattern)  onto  the  photodetector  array  of  an 
opto-electronic  neural  net  consisting  of  16  unipolar  binary  neurons  of  the  type  described 
elsewhere  [2].  The  connectivity  matrix  of  the  network  was  the  same  random  ternary 
matrix  utilized  in  earlier  work  [1].  The  brightness  of  the  TV  screen  is  controlled  by 
the  D/A  output  of  a  MASSCOMP  computer,  and  the  convergent  state  is  monitored  by 
the  A/D  input  of  the  same  computer.  A  photograph  of  the  experimental  arrangement 
is  shown  in  Figure  2.  We  investigated  four  types  of  cooling  profiles:  linear,  concave, 
convex,  and  stair-case  illustrated  in  Figure  3.  For  each  cooling  profile,  we  investigate  5 
annealing  time  intervals:  100.  200,  500,  1000,  and  2000  ms.  For  each  cooling  profile  and 
annealing  time  interval,  we  do  the  annealing  100  times  to  collect  sufficient  statistics, 
and  find  the  probability  that  the  system  converges  to  its  global  minimal  energy  state. 
The  experimental  results  obtained  show  that  the  setup  can  find  the  global  energy 
minimum  of  an  artificial  neural  net  of  16  neurons  in  2000  ms  which  corresponds  to 
32  time  constants  of  the  neurons  in  the  test  network.  A  net  of  neurons  with  response 
time  of  1  n  sec  would  anneal  therefore  in  few  tens  of  microseconds  and  this  is  expected 
to  be  independent  of  the  number  of  neurons  in  the  net  as  long  as  parallel  injection  of 
noise  in  the  network  is  implemented.  The  cooling  profile  had  no  observable  effect  on 
this  result.  The  probabilities  of  convergence  to  a  global  minimum  as  function  of  the 
annealing  duration  for  different  annealing  profiles  are  shown  in  the  table  1. 

4  Stochastic  Learning  With  Binary  Weights 

The  Boltzmann  machine  learning  algorithm  described  earlier  employs  graded  weights. 
However,  from  practical  viewpoint,  learning  in  artificial  neural  nets  can  be  simplified 
considerably  if  binary  weights  can  be  used.  This  would  pave  the  way  to  using  fast 
nonvolatile  binary  spatial  light  modulators  (SLMs)  such  as  Magneto-Optic  SLM  and 
Ferroelectric  liquid  crystal  SLM.  However,  a  Boltzmann  machine  basically  is  an  adap¬ 
tive  system.  If  the  step  size  of  adaptive  changes  is  too  large  and  the  sensitivity  of 
system  response  to  the  error  signal  is  high,  the  machine  will  generally  become  unsta¬ 
ble.  Since  a  traditional  Boltzmann  machine  has  a  high  sensitivity  in  response  to  error 
signal,  i.e.,  it  responds  to  the  error  signal  ( Pt]  -  P'  )  to  modify  synaptic  weights  even 
when  the  error  signal  is  very  small,  small  weight  variations  are  required  to  prevent 
the  system  from  becoming  unstable.  However,  in  a  binary  weight  net  ( VV';  =  1.-1) 
the  step  size  of  adaptive  change  is  large  and  fixed  (-2  or  2).  In  order  to  prevent  the 
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system  from  becoming  unstable,  we  increase  the  inertia  of  weights  i.e.  weights  do  not 
change  when  small  value  of  P,;  -  occurs.  As  a  result,  the  learning  procedure  of  the 
Boltzmann  machine  in  a  binary  weight  net  would  be  identical  to  the  procedure  of  the 
graded  weights  net  stated  in  the  system  architecture  section,  except  step  6  which  is 
modified  as  follows:  If  P t:  -  P't]  >  M,  set  =  1;  if  PtJ  -  Pi:  <  -M ,  set  =  -1; 
otherwise,  no  change,  where  M  €  [0, 1]  is  a  fixed  constant. 

The  goal  of  the  Boltzmann  machine  is  to  minimize  the  Cross-Entropy  G  by  means 
of  modifying  the  weights  of  the  net  in  a  certain  order.  The  G  space  is  an  information 
theoretic  measure  of  the  distance  between  the  probability  distributions  when  an  envi¬ 
ronmental  input  is  present  in  the  net  and  when  it  is  free  running  with  no  environmental 
input  applied,  and  is  given  by 

a-'ZF¥<y.)*Z2g)  a) 

where  P+[Va)  is  the  probability  of  the  visible  units  being  in  the  a  state  when  the  visible 
units  are  subjected  to  the  environmental  input.  Namely.  P+(Vra)  represents  the  desired 
or  specified  probability  for  the  a  state.  P~{Va)  is  the  corresponding  probability  when 
the  net  is  free-running.  Namely,  P~(Va)  represents  the  actual  probability  generated 
from  the  net  for  the  a  state.  P~(Va)  depends  on  the  weights  W,: ,  and  so  G  can  be 
altered  by  changing  Wt;.  Since,  in  general,  there  are  local  minima  in  G  space,  gradient 
descent  search  will  find  a  local  minimum  instead  of  the  global  minimum.  In  order  to 
reach  the  global  minimum  in  G  space,  introduction  of  noise  in  G  space  is  required. 
However,  if  the  noise  level  is  too  large,  the  network  can  not  generate  the  specified  or 
desired  environmental  distribution.  A  systematic  way  for  adding  noise  in  G  space,  i.e. 
an  annealing  scheme  in  G  space,  has  not  yet  been  studied  in  detail.  Here  we  propose 
the  use  of  the  final  temperature  Tq  of  the  simulated  annealing  schedule  used  in  the 
energy  space  E  as  the  annealing  parameter  in  G  space,  since  P~  ( Vy )  is  function  of  T0. 
In  the  first  few  learning  cycles,  we  use  high  values  of  To.  This  will  provide  high  level 
of  noise  in  G  space.  The  value  of  To  is  decreased  gradually  along  with  the  number  of 
learning  cycles.  Accordingly,  a  simulated  annealing  process  in  G  space  is  accomplished 
by  decreasing  the  final  temperature  To  in  a  similar  way  to  the  simulated  annealing 
process  in  energy  space  which  is  accomplished  by  decreasing  annealing  temperature  T . 
Note  that  an  annealing  schedule  with  high  value  of  To  is  equivalent  to  a  short  time 
interval  annealing  schedule  in  E  space,  i.e.,  both  cases  can  generate  high  level  of  noise 
in  G  space,  and  vice  versa.  Accordingly,  annealing  time  interval  in  E  space  can  also  be 
used  as  an  annealing  parameter  in  G  space.  As  a  result,  a  simulated  annealing  process 
in  G  space  can  also  be  accomplished  by  gradually  increasing  the  annealing  time  interval 
in  E  space  along  with  the  number  of  learning  cycles.  Results  of  computer  simulations 
of  stochastic  learning  by  simulated  annealing  in  a  Boltzmann  machine  employing  both 
graded  and  binary  weights  are  presented  in  the  next  section. 

5  Simulation  Results 

In  these  simulations  we  use  noisy  threshold  (N-Tj  annealing  scheme  [l]  and  use  the 
annealing  time  interval  in  E  space  as  an  annealing  parameter  in  G  space.  .All  the  simu- 
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lations  learn  to  solve  a  4-2-4  encoder  problem  [3]  in  the  context  of  Boltzmann  machine 
formalism  i.e.  this  consists  of  having  a  three  layered  net,  of  the  kind  described  in  the 
architecture  section,  learn  to  form  its  own  internal  representations  of  the  associations 
presented  to  it.  For  all  simulations,  the  net  reaches  equilibrium  100  times  (25  times  for 
each  input  vector)  for  collecting  the  statistics  of  Pii  during  the  input  and  output  clamp¬ 
ing  phase.  The  situation  is  the  same  for  collecting  the  statistics  of  Pi}.  All  annealing 
schedules  are  stated  in  the  corresponding  Figures  in  the  notation  of  I©T  explained 
earlier  [lj.  The  noise  we  used  is  binary  noise  whose  amplitude  is  either  T  or  —T  and 
is  decreased  gradually  in  time  and  terminated  at  at  To.  Figure  4  shows  the  results 
of  the  linear  weight  learning  scheme,  and  Figures  5  shows  the  results  of  the  binary 
weight  learning  scheme.  .411  Figures  show  the  results  for  12  runs.  The  parameter  M 
we  used  is  0.1.  Only  two  annealing  schedules  in  E  space  are  used  for  the  annealing 
in  G  space.  During  the  first  half  of  the  total  number  of  learning  cycles  the  short  time 
interval  annealing  schedule  is  employed,  and  during  the  later  half  of  the  learning  cycles 
the  long  time  interval  annealing  schedule  is  employed.  These  results  show  the  viability 
of  the  annealing  scheme  in  G  space,  and  also  show  the  viability  of  the  binary  weight 
stochastic  learning  scheme. 

6  Conclusions 

We  have  described  an  architecture  for  partitioning  an  opto-electronic  analog  of  a  neural 
net  to  form  a  multilayered  net  that  permits  self-organization  and  learning  when  com¬ 
puter  controlled  nonvolatile  spatial  light  modulators  are  utilized  to  realize  the  required 
plasticity.  The  focus  here  is  on  stochastic  learning  as  opposed  to  deterministic  learning 
because  it  may  provide  insight  in  the  role  of  noise  in  biological  neural  nets.  We  also 
described  opto-electronic  architectures  that  can  be  used  for  fast  determination  of  the 
energy  of  the  net  if  such  information  is  needed  and  for  adaptive  deterministic  deforma¬ 
tion  of  the  net’s  energy  landscape  to  control  its  behavior.  We  show  that  departure  from 
the  conventional  simulated  annealing  algorithm  through  the  use  of  noisy  thresholding 
in  opto-electronic  .'hemes  promises  to  markedly  accelerate  the  annealing  process,  and 
make  stochastic  learning  practical.  Employing  the  noisy  thresholding  scheme  a  small 
opto-electronic  neural  net  (of  16  neurons)  was  found  to  reach  a  global  energy  minimum 
or  one  close  to  it  in  about  32  neuron  time  constants.  We  also  show  that  binary  weight 
learning  algorithm  can  be  used  in  the  context  of  a  modified  Boltzmann  machine.  This 
paves  the  way  to  the  use  of  nonvolatile  binary  spatial  light  modulators  to  realize  the 
required  plasticity  in  such  stochastic  learning  nets.  Such  nets,  having  learned  their 
environmental  inputs  can  be  ‘‘frozen"’  for  use  as  associative  memories  of  the  entities 
learned  by  merely  removing  injected  noise  from  the  net.  Noise  injection  for  annealing 
returns  the  nets  to  a  “soft”  mode  for  learning  new  environmental  inputs. 
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Fig.  I.  Architecture  for  optoelectronic  analog  of  layered  self- programming 
net.  (a)  partitioning  concept  and.  ( b )  arrangement  for  rapid 
determination  of  the  net  s  global  energy  E  for  use  in  learning 
by  simulated  annealing. 


Fig.  4. 

Linear  weight  learning  curve  with  N-T 
algorithm.  Annealing  schedule  in  E 
space:  during  the  0-25th  learning  cycle 
2  S  3.  1  §  1.5,  1  §  l,  a,nci  o  3  0.1: 
during  the  26— 50th  learning  cycle  5  3  ; 
6  §  0.3.  6  3>  0.5.  6  S  0.1.  and  6  §  0. 
This  is  a  annealing  scheme  in  G  space. 

Fig.  5. 

Binary  weight  learning  curve  with  N-T 
algorithm.  Annealing  schedule  in  E 
space:  during  the  0—50th  learning  cvcle 
2  §  3.  l  a  1.5.  l  a  i,  and  2  a  o.i:’ 
during  the  51-1 00th  learning  cycle  4  a  l 
4  a  0.3.  4  a  0.5.  4  a  o.l.  and  4  a  o. 
This  is  a  annealing  scheme  in  G  space. 


2.  Pictorial  view  of  opto-electronic  neural 

net  of  16  unipolar  binary  neurons  with  random 
ternary  weights  used  to  verify  fast  annealing 
by  noisy  threshoolding. 
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g.  3.  Cooling  profiles 


